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ABSTRACT 


The  problem  of  modeling,  analysis  and  reconstruction  of  noisy  and/or 
distorted  syntactic  patterns  Is  studied.  Segmentation  errors  and  primitive 
extraction  errors  can  be  treated  as  syntax  errors  and  defined  in  terms  of 
language  transformation  rules.  Three  types  of  error  transformations  are 
defined  on  strings,  namely  substitution.  Insertion  and  deletion.  Con- 
sequently, the  parser  constructed  according  to  the  grammar  generating  the 
strings  and  the  three  types  of  transformations  Is  called  the  error-correcting 
parser.  This  technique  Is  also  extended  to  tree  languages.  In  formulating 
error-correcting  tree  automata  (ECTA) , five  types  of  error-transformations 
on  trees  are  defined,  namely,  substitution,  split,  stretch,  branch  and 
deletion.  By  way  of  using  language  transformations,  the  distance  between 
two  sentences  can  be  determined.  A definition  of  distance  between  a sentence 
and  a language  is  proposed.  Based  on  this  definition,  a clustering  procedure 
Is  proposed,  where  error-correcting  parsers, are  employed  to  determine  the 
distance  between  an  input  syntactic  pattern  and  a formed  cluster,  or  a 
language.  Finally,  using  the  error-correcting  parsing  techniques,  real 
data  exampieson  texture  modeling  and  discrimination  are  presented. 


. V * 


CHAPTER  I 
INTRODUCTION 


1 .1  Purpose 

During  the  past  decade,  there  has  been  an  increasing  Interest  in 
pattern  recognition.  Host  of  the  developments  in  the  theory  and  appli- 
cations of  pattern  recognition  use  the  statistical  approach  [1-3].  In 
order  to  represent  the  structural  information  contained  In  the  patterns, 
the  syntactic  or  structural  approach  has  been  proposed  [4-5].  This  approach 
draws  an  analogy  between  the  structure  of  patterns  and  the  syntax  of 
languages.  The  precision  of  syntactic  specification  provides  the  recogni- 
tion procedure  not  only  the  capability  of  classifying  patterns  but  also 
the  capacity  of  describing  patterns.  However,  one  of  the  weaknesses  of 
this  approach  Is  the  problem  of  recognizing  noisy  patterns.  Several 
approaches  have  been  used  in  dealing  with  noisy  patterns,  namely;  stochastic 
grammars  or  discriminant  grammars,  sequential  parsing  or  partial  parsing 
methods,  language  transformations,  and  error-correcting  parsers.  The 
purpose  of  this  research  is  to  develop  error-correcting  parsing  algorithms 
suitable  for  syntactic  pattern  recognition. 

1 .2  The  Recognition  of  Noisy  and  Distorted  Patterns 

Using  a syntactic  approach  to  pattern  recognition,  a set  of  training 
patterns  Is  first  analyzed.  A pattern,  according  to  its  structure.  Is 
divided  into  subpatterns.  Subpatterns  can  be  further  divided  into  sub- 
subpatterns, and  so  on.  The  basic  element  is  called  a "pattern  primitive." 
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Linguistic  notations  are  used  to  describe  a pattern  In  terms  of  primitives 
and  relations  between  them  as  in  a sentence.  The  set  of  sentences  corre- 
sponding to  the  set  of  training  patterns  can  be  specified  by  a generative 
grammar  called  a "pattern  grammar."  Non-terminals  In  the  pattern 
grammar  represent  the  subpatterns  and  terminals  represent  primitives  and 
possibly  some  relational  symbols.  The  structure  of  patterns  is  characterized 
by  production  rules  of  the  grammar. 

in  a recognition  procedure,  after  preprocessing, segmentation  and 
primitive  extraction,  an  input  pattern  is  represented  as  a sentence,  then  a 
parser  is  employed  as  a pattern  recognizer.  A parser  is  an  algorithm 
based  on  a given  pattern  grammar,  G,  that  can  produce  a complete  syntactic 
description  in  the  form  of  a parse  tree  of  an  Input  sentence  If  It  belongs 
to  L(G),  the  language  generated  from  G.  A block  diagram  of  a syntactic 
pattern  recognition  system  is  given  In  Figure  1-1. 

In  a pattern  classification  problem,  parsers  are  used  to  determine 
the  membership  of  an  Input  pattern.  Grammars  are  constructed  to  characterize 
each  of  the  classes  of  patterns.  An  Input  pattern  Is  then  parsed  with 
respect  to  the  pattern  grammars  one  by  one  to  decide  which  language 
(class)  It  belongs  to. 

In  practical  applications,  there  often  exists  pattern  distortion 
and  measurement  noise  causing  segmentation  and  primitive  extraction 
errors  which  ultimately  result  In  a noisy  representation  (sentence), 
that  is,  It  cannot  be  successfully  analyzed  by  the  parser.  The  following 
are  situations  that  may  cause  the  representation  of  a pattern  to  be 
noisy;  (l)  unpredictable  distortions  and  variations,  (2)  simplified 
grammars . 


Figure  1.1  Block  diagram  of  a syntactic 
pattern  recognition  system. 


(1)  Unpredictable  distortions  and  variations.  Normally,  one  would 
like  to  construct  a grammar  which  generates  as  much  variety  In  patterns 

as  possible.  The  construction  of  a grammar  Is  based  on  a priori  knowledge 
available;  e.g.  the  given  set  of  patterns,  or  predictable  noise,  distortion 
or  variation  of  patterns.  However,  not  all  the  distortions  and  variations 
of  patterns  are  predictable.  This  uncertainty  may  cause  some  patterns  to 
be  rejected  during  recognition. 

(2)  Simplified  grammars.  In  a classification  problem.  In  order  to 
avoid  any  ambiguity  caused  by  the  overlaps  between  languages,  grammars 
may  be  constructed  to  exclude  some  known  patterns  as  well  as  some  expected 
distortions  In  order  to  be  simpler  and  smaller.  There  Is  a decision  which 
has  to  be  made  between  the  descriptive  precision  and  the  analysis  efficiency 
of  a grammar.  One  may  construct  a large  grammar  (with  a large  number  of 
production  rules)  which  generates  a language  that  very  closely  yields 

the  given  set  of  patterns,  or  a simpler  grammar  which  does  not  generate 
some  of  the  known  patterns  but  uses  less  parsing  time  and  storage  space. 

Stochastic  grammars  have  been  suggested  In  resolving  the  uncertainty 
of  patterns  [6-8].  With  a probability  assigned  to  each  production  rule, 
the  probability  distribution  of  sentences  generated  from  the  stochastic 
grammar  can  be  used  to  model  the  probability  distribution  of  patterns. 

Normal  patterns  are  discriminated  from  noisy  patterns  by  their  associated 
probabilities.  This  approach  requires  a large  amount  of  training  data 
In  order  to  make  a meaningful  probability  distribution  of  patterns. 

Page  and  Fllipskl  [9],  propose  the  use  of  a discriminant  grammar  which 
Is  an  extension  of  the  stochastic  grammar  approach.  The  generated 
language  from  a discriminant  grammar  Is  supposed  to  Include  ail  the  classes 
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of  patterns  under  consideration.  In  a discriminant  grammar,  a number  Is 
associated  with  each  production  rule.  By  adding  the  numbers  corresponding 
to  each  use  of  a production  rule  In  a derivation  of  a sentence,  a number 
associated  with  the  sentence  Is  derived.  The  language  Is  partitioned 
Into  decision  regions  by  comparing  this  number  to  a predetermined  cut 
point.  A discriminant  grammar  can  also  be  used  In  making  a Bayes*  decision 
between  two  stochastic  languages.  In  this  case,  the  number  assigned  to 
each  production  rule  Is  obtained  from  the  probability  distribution  of 
sentences.  Therefore,  the  construction  of  discriminant  grammars  faces 
a similar  problem  to  that  of  stochastic  grammars,  namely,  that  a very  large 
amount  of  training  data  is  necessary  in  order  to  make  a meaningful  prob- 
ability distribution. 

An  Interesting  feature  that  stochastic  grammars  and  discriminant 
grammars  have  Is  the  sequential  parsing.  Persoon  and  Fu  [10]  proposed  an 
algorithm  of  sequential  parsing  for  stochastic  context-free  grammars. 

Using  a stopping  rule,  only  part  of  the  Input  string  needs  to  be  scanned 
when  a decision  Is  made.  The  classification  rule  used  is  Bayes'  decision 
rule.  Page  and  FlllpskI  [9]  also  proposed  a scheme  for  sequential  parsing 
of  discriminant  grammars  based  on  the  sequential  probability  ratio  test. 
Using  the  sequential  parsing  method,  since  only  the  left  part  of  a string 
is  Involved  In  the  decision  making  process,  one  may  construct  the  pattern 
grammars  In  such  a way  that  the  most  Informative,  distinctive  subpatterns 
and  primitives  are  generated  first.  Consequently,  the  sequential 
classification  schemes  demonstrate  an  error  tolerance  capability  to  some 
extent.  The  error  tolerance  of  sequential  parsing  Is  investigated  In 
Chapter  2 of  this  thesis. 
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AH  and  Pavlidis  [11]  used  a similar  idea  for  the  construction  of 
parsers  for  hand  written  numerals.  Finite  state  grammars  are  used  for 
the  characterization  of  each  class  (numeral).  A set  of  finite  state 
automata  is  designed  to  read  primitives  considered  to  be  dlscrimlnantlng, 
while  unimportant  primitives  are  neglected.  Since  the  recognizer  Ignores 
some  Irrelevant  details  which  are  extracted  by  pattern  description 
algorithms  it  can  thus  reduce  recognition  errors. 

The  use  of  language  transformation  for  the  representation  of  special 
types  of  pattern  distortions,  such  as  scaling,  rotation  and  replacement, 
etc.  was  suggested  by  Fu  and  Bhargava  [12],  This  concept  also  appears 
in  Aho  and  Peterson's  paper  [13]  where  error  transformation  for  substitution, 
deletion  and  insertion  errprs  are  defined.  Aho  and  Peterson  further 
expand  the  original  grammar  to  include  error  transformations  as  production 
rules.  Based  on  the  expanded  grammar  an  error-correcting  parsing  algorithm 
was  formulated  for  substitution,  deletion  and  insertion  errors  In  general. 

The  correction  satisfies  the  minimum-distance  criterion. 

The  approach  of  using  error-correcting  parsers  is  the  method  used  in 
this  research.  A parser  constructed  on  a given  grammar,  G,  performs  the 
function  of  analysing  an  input  string,  x.  The  analysis  result  Is  a 
complete  parse  of  x,  if  x Is  In  L(G),  the  language  generated  from  G.  From 
the  parse,  a derivation  tree  which  represents  the  structure  of  x can  be 
reconstructed.  Suppose  that  x is  not  in  L(G).  Then  the  parser  can,  at 
most,  generate  a partial  parse.  Therefore,  for  a given  gramar,  G,  a 
parser  can  be  used  to  answer  the  membership  problem,  it  can  also  be  used 
to  describe  the  syntax  structure  if  the  Input  sentence  is  in  L(G)  [I**]. 

An  error-correcting  parser  is  designed  to  generate  a complete  parse  even 
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If  the  Input  sentence  Is  not  In  L(G).  Hence,  using  an  error-correcting 
parser  In  a pattern  recognition  problem,  a noisy  pattern  can  be  successfully 
analyzed  and  recognized.  Block  diagrams  are  given  In  Figures  1 -2 (a ) and 
(b)  to  Illustrate  the  function  of  a parser  and  an  error-correcting  parser. 

A corrected  pattern  may  further  be  reconstructed  from  the  parse.  The 
use  of  error-correcting  parsers  In  dealing  with  noisy  patterns  has  the 
following  two  advantages  over  other  methods. 

(1)  Improvement  of  recognition  performance  under  Inadequate  training. 
The  construction  of  grammars,  manually  or  automatic.  Is  an  Important  part 
In  the  design  of  a syntactic  pattern  recognition  system.  An  elaborate 
design  gives  better  recognition  performance,  but  such  a design  Is  certainly 
data  dependent.  Inadequate  inference  procedures  or  Insufficient  training 
data  will  result  In  a poorly  constructed  grammar,  and  consequently  poor 
recognition  performance.  The  use  of  error-correcting  parsers  as  a pattern 
recognizer  will  compensate  this  difficulty  In  grammar  construction.  Hence, 
even  with  a rather  poorly  constructed  grammar,  the  use  of  error-correcting 
parsers  can  give  satisfactory  recognition  results. 

(2)  Correction  of  noisy  patterns.  When  pattern  grammars  are  con- 
structed from  noise-free  patterns  only,  noisy  or  distorted  pattersn  can 
be  corrected  by  using  error-correcting  parsers. 

1 .3  Survey  of  Error-Correcting  Parsing 

The  idea  of  using  syntax  rules  in  correcting  program  errors  or 
punctuation  errors  arose  with  the  design  of  syntax-directed  compilers. 
Because  the  syntactic  specifications  are  precise,  syntax  analysis  not 
only  plays  a central  role  In  the  organization  of  compilers,  it  also 
provides  error  detection  and  recovery  capability  within  the  compiler. 
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Figure  1.2  Syntax  analysis  using  a parser  (a)  and 
an  error-correcting  parser  (b) 
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Most  error- recovery  strategies  take  the  point  where  parsing  falls 
to  continue  as  the  point  of  detection  of  errors,  [l5"2l].  A recovery 
action  Is  then  applied  to  suppress  the  error  so  that  parsing  can  be 
resumed.  Diagnostic  Information  or  corrections  may  also  be  generated  at 
this  point.  In  [15],  Ironsattempts  to  supply  some  recovery  action  at 
each  point  where  an  Inconsistency  Is  detected  by  a top-down  parser.  Part 
of  the  Input  sentence  Is  replaced  based  on  the  context  of  the  error  and 
productions  of  the  grammar  to  allow  parsing  to  continue.  Similar  to 
Iron's  Idea,  Grles  [16]  proposes  more  sophisticated  error-recovery 
strategies  for  top-down  parsers.  Using  a precedence  grammar,  an  error 
Is  detected  when  no  precedence  relation  exists  between  the  Incoming 
terminal  and  the  symbol  at  the  top  of  the  stack,  or  the  phrase  to  be 
reduced  has  no  equivalent  right  hand  side  of  a production.  WIrth  [17] 
proposes  an  error  recovery  algorithm  by  scanning  tables  of  error  rules 
for  an  entry  which  applies  to  the  erroneous  condition.  The  table  of 
error  rules  Is  based  on  designers'  knowledge  of  common  programming  errors 
and  appropriate  recovery  actions.  Lefnlus  [ 1 8]  discusses  strategies  of 
isolating  the  smallest  potential  phrase  and  then  makes  the  required 
replacement.  Graham  and  Rhodes  [19],  suggest  a weighted  minimum  distance 
measure  for  finding  a "closest  fit"  local  corrections  when  an  error  recovery 
routine  faces  numerous  choice  of  next  move. 

Error-correcting  for  compilers  emphasizes  early  detection  of 
errors  and  generating  accurate  diagnosis.  Nevertheless,  error  detection 
may  arbitrarily  be  delayed.  In  a study  by  Aho  and  Uflman  [23], they 
concluded  that  a precedence  parser  will  not  always  detect  an  error  as 
soon  as  a corresponding  LR(1)  parser.  Repairing  techniques  for  LR  and 
LL  parsers  are  still  subjects  to  be  studied. 
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An  alternative  to  the  heuristic  approach  for  error-correcting 
parsing  Is  the  minimum-distance  error-correcting  techniques.  All  the 
potential  errors  and  their  corrections  are  recorded  during  parsing.  After 
the  entire  string  has  been  processed,  a derivation  that  satisfies  the 
minimum-distance  criterion  will  be  generated.  Aho  and  Peterson  formulate 
substitution,  deletion  and  Insertion  errors  in  terms  of  error  transformations 
[13].  Distance  between  two  sentences  is  defined  as  the  least  number  of 
transformations  used  to  derive  one  from  the  other.  The  original  grammar 
is  then  expanded  to  include  error  transformations  In  the  set  of  production 
rules  such  that  its  generated  language  contains  all  the  possible 
erroneous  sentences.  The  parsing  algorithm  Is  a modified  Earley's 
parser  [3^1  with  provisions  added  for  the  bookkeeping  of  the  number  of  error 
transformations  used.  During  parsing,  the  potential  derivation  that  uses 
the  least  number  of  error  production  rules  Is  placed  in  the  parse  list. 

By  the  time  a parsing  Is  completed,  the  minimum-distance  correction  is 
also  achieved.  Instead  of  formulating  error  production  rules,  Lyon 
proposes  a scheme  that  puts  all  the  possible  corrections  such  as  the 
substitutions,  deletions  and  insertions  of  the  currently  scanned  symbol 
in  the  parse  list  [2A].  Setting  limitations  on  the  number  of  local  errors 
and  global  errors  are  also  suggested  by  Lyon  to  decrease  the  parsing  time 
and  memory  storage. 

Although  the  minimum-distance  error-correcting  parsers  (MDECP)  use 


a similarity  criterion  in  their  searching  for  the  syntactically  correct 
sentence  they  are  considered  impractical  from  compiler  design  point  of 
view.  Compiler  designers'  Interests  are  In  methods  that  generate  accurate 
diagnostic  information  and  continue  parsing  more  than  that  of  automatic 


11 

A spelling  error  correction  technique  Is  proposed  by  Morgan  [25]. 

He  uses  a heuristic  method  to  search  for  a good  match  of  the  input  string 

from  a table  of  code  words.  A more  rigorous  approach  in  finding  a best 

match  from  a finite  set  of  strings  for  an  Input  string  Is  to  use  the 

algorithm  given  by  Wagner  and  Fisher  [26].  Error-correcting  parsers  for 

regular  language  are  proposed  by  Wagner  [27]  and  Thomason  [28]  respectively. 

The  second  Interesting  application  of  error-correcting  parsing  is 

in  syntactic  decoding  systems  where  errors  are  caused  from  noisy 

communication  channels  [29 -32].  In  modeling  the  randomness  of  noisy 

channels,  it  Is  essential  that  the  designed  probabilistic  model  can  be 

applied  to  the  syntactic  processing  of  linguistic  information.  Bahi  and 

Jeiinek  [30],  proposed  a first  order  Markov  chain  modei  for  noisy  channels 

in  which  an  Input  sequence,  a.a0...a  can  produce  output  sequence 

l z n 

bjb2...bm  of  varying  lengths.  This  is  done  by  associating  with  each 
Input  symbol  a^  a probabilistic  finite  state  machine.  A null  transition, 
self  loop  transitions  and  transitions  producing  output  other  than  a^  are 
added  to  the  finite  state  machine  for  modeling  deletion,  insertion  and 
substitution  errors.  In  Fung  and  Fu's  probabilistic  deformation  model, 
context-free  languages  with  substitution  errors  are  considered  [31]. 

Let  x * a{a2...an  be  a string,  the  error  occurred  on  a symbol,  a^  is 
assumed  to  be  independent  from  Its  context  in  x.  Therefore,  the  probability 
that  y ■ b^..^  is  an  error  deformed  string  of  x is  defined  as  follows 

n 

q(y|x)  - JI  q(b.  |a.) 

1-1  ' ' 

where  q (b ^ | a ^ ) Is  the  probability  that  bj  substitutes  for  aj.  Let  L(Gj) 
and  L^)  be  two  languages,  the  max  I mum- 1 Ike  1 1 hood  decision  rule  proposed 
by  Fung  and  Fu  is 
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Y e L(G  ) 

y e l(g2) 


Given  a grammar  G,  the  parser  designed  for  the  searching  of  x that  satisfies 

the  max I mum- 1 ike I i hood  criterion,  max  Q (y | x) , where  y is  an  Input 

x e L(G) 

string,  is  called  maximum-! ikei ihood  error-correcting  parser  (MLECP) . 

A modified  Cocke-Younger-Kasami  parser  Is  used  by  Fung  and  Fu.  Thompson 
[32]  formulates  error-correcting  parsers  for  stochastic  grammar  In 
Greibach  normal  form  (GNF).  The  approach  of  using  expanded  grammars  in 
which  substitution,  deletion,  insertion  and  transposition  errors  are 
added  as  error  production  rules  is  also  used  here.  The  four  types  of 
errors  are  treated  seperateiy.  Four  separate  algorithms  each  of  which 
copes  with  one  type  of  error  are  then  combined  to  correct  simultaneous 
errors.  Although  Thompson  gives  no  discussion  on  complexity  of  this 
ECP,  he  points  out  that  the  originai  top-down  parser  for  GNF  has  already 
exponential  growth  In  computational  complexity.  Ambiguity  caused  by  the 
four  expanded  grammars  is  expected  to  increase  this  complexity.  The 
combined  algorithm  would  further  compound  time  complexity  with  the 
enormity  of  the  bookkeeping  problem.  It  is  suggested  to  combine  the 
algorithm  serially  to  simplify  the  process.  However,  the  use  of  serial 
combination  may  not  always  give  max Imum-1 ike 1 ihood  correction. 

The  Idea  of  using  error-correcting  parsers  In  the  syntactic  recogni- 
tion of  noisy  patterns  has  also  been  suggested  [31,33].  Since  pattern 
recognition  systems  handle  a variety  of  types  of  Input  data,  such  as 
pictorial  data  [35"39],  waveforms  [l*0-l*3l»  speech  patterns,  program  schemes 

[^♦4-^*5]  > or  data  files  t46]  etc.,  the  metalanguages  used  to  describe 

\ * 

patterns  can  be  sets  of  strings,  trees  or  graphs,  in  addition  to  type  I, 


type  2,  and  type  3 grammars,  programmed  grammars  [47-48],  transition 
networks  [49],  tree  grammars  [50],  graph  grammars  [45,51],  web  grammars 
[52 , 53 J , array  grammars  [54-56],  and  many  others  have  been  used  for 
pattern  analysis.  An  error-correcting  parsing  scheme  for  context- 
sensitive  grammars  is  proposed  by  Tanaka  and  Fu  [57]. 

Error-correcting  parsing  for  syntactic  pattern  recognition  Is  still 
at  a beginning  stage.  Most  existing  pattern  grammars  do  not  have  their 
corresponding  error-correcting  parsers.  Similarity  measure  Is  a key 
point  in  designing  such  a parser  for  a pattern  recognition  system.  The 
idea  of  using  language  transformations  for  the  modeling  of  primitive 
extraction  and  segmentation  errors,  and  constructing  a parser  based  on 
the  expanded  grammar  which  includes  transformation  rules,  provides  a 
global  distance  measurement.  We  shall  use  this  approach  in  formulating 
error-correcting  parsers  for  stochastic  and  non-stochastic  context-free 
languages  and  tree  languages.  A minimum-distance  error-correcting  parser 
for  context-free  program  grammar  is  also  presented. 

1 .4  Thesis  Organization 

In  Chapter  2,  the  distance  between  two  strings  is  measured  in  terms 
of  the  three  defined  transformations,  namely,  substitution,  deletion  and 
insertion  transformations.  This  measurement  provides  a similarity  measure 
between  syntactic  patterns.  Error-correcting  parsers  are  formulated 
based  on  a similarity  criterion;  e.g.  the  minimum-distance  criterion. 
Definitions  on  distance  between  a string  and  a language  are  proposed.  A 
minimum-distance  classification  system  using  a modified  error-correcting 
parsing  algorithm  is  presented.  A similar  approach  is  applied  to  a stochastic 
model  where  stochastic  languages,  deformation  probabilities  and  maximum- 
likelihood  criterion  are  used. 


The  problem  of  error-correcting  syntax  analysis  for  tree  languages 
Is  studied  In  Chapter  3»  Syntax  errors  on  trees  are  defined  In  terms  of 
five  types  of  transformation,  namely,  substitution,  deletion,  stretch, 
split  and  branch.  In  the  formulation  of  error-correcting  tree  automata 
(ECTA),  transformations  made  on  each  terminal  symbol  are  added  to  the 
automata  in  the  form  of  transition  functions.  Two  types  of  ECTA  are 
proposed:  one  for  substitution  errors  called  SPECTA  and  one  for  all  five 
types  of  errors  called  GECTA.  Real  data  examples  of  using  SPECTA  for 
LANDSAT  data  interpretation  and  GECTA  for  character  recognition  are 
presented. 

Chapter  and  Chapter  5 describes  two  potential  applications  of 
error-correcting  parsers  in  the  area  of  pattern  recognition.  As  the 
distance  between  a sentence  (a  syntactic  pattern)  and  a language  (a 
group  of  syntactic  patterns)  is  defined,  and  its  computation  Is  implemented 
by  using  error-correcting  parsers,  a clustering  procedure  for  syntactic 
patterns  is  proposed  in  Chapter  4.  A character  recognition  experiment 
is  given  as  an  Illustrative  example. 

A syntactic  model  for  the  generation  and  the  discrimination  of 
structured  textures  Is  described  in  Chapter  5.  A texture  pattern  is 
first  divided  into  fixed-sized  windows.  Windowed  patterns  belonging  to 
the  same  class  of  texture  are  then  characterized  by  a tree  grammar.  The 
uncertainty  existing  in  texture  patterns;  e.g.  local  noise,  structure 
distortion,  makes  them  impossible  to  be  fully  characterized  by  the 
constructed  grammars.  Therefore,  SPECTA's  are  used  as  texture 
discriminators.  Texture  patterns  generated  by  tree  grammars  are  Illus- 
trated. Discrimination  results  are  also  given. 


CHAPTER  2 


ERROR-CORRECTING  PARSING  FOR  STRING  LANGUAGES 

2.1  Introduction 

In  this  chapter,  a distance  between  two  strings  Is  first  defined  and 
then  extended  to  the  distance  between  a string  and  a language.  The 
distance  between  two  strings  is  defined  In  terms  of  the  minimum  number  of 
error  transformations  used  to  derive  one  from  the  other  by  Aho  and  Peterson 
[13].  When  the  error  transformat  Ions  are  defined  in  terms  of  substitution, 
deletion  and  insertion  errors,  the  distance  measurement  coincides  with 
the  definition  of  Levenshteln  metric  [61],  In  Section  2.2,  error  trans- 
formations are  applied  to  weighted  Levenshteln  metric.  Also,  a new  metric, 
simply  called  weighted  metric,  which  would  reflect  the  difference  of 
the  same  type  of  error  made  on  different  terminals  is  proposed.  This 
extension  provides  a similarity  measure  between  two  sentences  more  closely 
related  to  the  similarity  of  their  corresponding  patterns. 

For  a given  input  string  y and  a given  grammar  G,  a minimum-distance 
error-correcting  parser  (MDECP)  is  an  algorithm  that  searches  for  a sentence 
z in  L(G)  such  that  the  distance  between  z and  y,  d(z,y)  is  the  minimum 
among  the  distances  between  all  the  sentences  in  L(G)  and  y.  The  algorithm 
also  generates  the  value  of  d(z,y).  We  simply  define  this  value  to  be  the 
distance  between  L(G)  and  y and  denote  it  as  dj(L(G),y). 

When  a given  grammar  is  a context-free  grammar  (CFG),  its  MDECP  can 
be  implemented  by  modifying  an  Earley's  parsing  algorithm.  We  also  extend 
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the  definition  of  the  distance  between  L(G)  and  y,  d ^ (L (G) ,y),  to  the 
definition  of  d^(L(G),y),  the  average  distance  between  y and  the  K 
sentences  in  L(G)  that  are  the  nearest  to  y.  The  computation  of  dK(L(G),  y) 
can  be  implemented  by  further  modification  of  the  algorithm  of  MDECP.  In 
Section  2.2.3  a minimum-distance  decision  rule  is  proposed  for  classi- 
fication of  syntactic  patterns. 

An  algorithm  of  MDECP  for  context-free  programmed  grammars  (CFPG)  is 
given  in  Section  2.3.  This  algorithm  is  restricted  to  Levenshtein's 
distance  only.  In  pattern  recognition,  the  context-free  programmed  grammars 
are  considered  having  higher  descriptive  power  than  the  context-free 
grammar.  It  is  proved  by  Rosenkrantz,  that  the  set  of  languages  gener- 
ated by  CFPG's  properly  contains  the  set  of  context-free  languages,  and 
is  properly  contained  within  the  set  of  context-sensitive  languages  [60] . 

A CFPG  generating  a context-sensitive  language  may  be  selected  in  order 
to  describe  the  patterns  effectively.  Although  a context-sensitive 
grammar  (CSG)  may  as  weil  be  used,  the  parsing  based  on  a CFPG  has  better 
analysis  efficiency  than  a CSG. 

In  Section  2.4,  the  stochastic  deformation  model  of  substitution 
errors  proposed  by  Fung  and  Fu  [31]  is  first  extended  to  include  deletion 
and  insertion  errors.  Based  on  the  deformation  model,  the  deformation 
probabilities  can  be  estimated  from  the  observations  of  these  errors. 

Similar  to  the  use  of  error  transformations  proposed  in  [133*  the  stochastic 
deformation  model  is  introduced  into  the  original  stochastic  context-free 
grammar  (SCFG) . The  Earley's  parser  is  modified  for  the  searching  of 
the  most  likely  error-correction  based  on  the  max i mum- l ike l i hood  criterion. 
We  shaii  call  this  algorithm  the  maximum-l ikel ihood  error-correcting  parser 


{MLECP).  For  an  Input  sentence  y and  a given  SCF6,  Gg,  a MLECP  generates 
the  most  likely  correction  of  y,  x.  Let  p(x)  be  the  probability  of  x In 
L(Gs).  The  MLECP  also  computes  the  value  of  q(y|x)p(x)  where  q(y|x)  Is 
the  deformation  probability  of  y given  x.  We  shall  Interpret  the  term, 
q(y|x)p(x) , as  the  deformation  probability  of  y given  L(Gg),  denoted  as 
q(y|Gg).  If  the  a priori  probability  of  each  grammar  Is  known,  Bayes1 
decision  rule  is  proposed  as  a decision  criterion. 

Due  to  the  Inefficiency  of  such  an  error-correcting  parser,  we  are 
also  interested  In  the  Improvement  of  parsing  speed.  Persoon  and  Fu  have 
proposed  a sequential  classification  algorithm  (SCA)  for  stochastic 
context-free  languages  [10].  The  error- tolerance  capability  of  SCA  Is 
Investigated  in  Section  2. *4. A.  We  further  modify  the  SCA  to  classify 
noisy  sentences  using  the  error-correcting  approach.  Experimental  results 
illustrate  that  within  a tolerable  percentage  of  misrecogni tion,  the 
speed  of  SCA  is  faster  than  that  of  MLECP. 

2.2  Minimum-Distance  Error-Correcting  Parsing  for  Context-Free  Languages 

Following  the  notations  used  In  El^J,  the  definition  of  grammars 
and  languages  Is  briefly  reviewed. 

Definition  2.1.  A grammar  Is  a I*- tuple 

G - (N,I,P,S)  where 

(1)  N Is  a finite  set  of  nonterminal  symbols 

(2)  I Is  a finite  set  of  terminal  symbols  disjoint  from  N. 

(3)  P Is  a finite  subset  of 

(NUZ)*N(NUI)*X(NU2)* 

An  element  (a,B)  in  P will  be  written  a B and  called  a 
production. 


(4)  S is  a distinguished  symbol  In  N called  the  start  symbol. 

Definition  2.2.  The  language  generated  by  a grammar  G,  denoted 

LtG),  Is  the  set  of  sentences  generated  by  G.  Thus, 

L(G)  « {w|<o  is  In  E , and  S te} 

it 

where  a relation  =>  on  (NUE)  is  defined  as  follows:  If  a3y  is 

in  (NUE)  and  3 6 is  a production  rule  In  P then  a3y  "“♦■aGy* 

* 

and  ■=“*  denotes  the  reflexive  and  transitive  closure  of  . 

If  each  production  in  P is  of  the  form  A -*•  a,  where  A is  in  N and 
a is  in  (NUE)  then  the  grammar  G ® (N,E,P,S)  is  classified  as  context-free 
grammar.  The  set  of  languages  that  can  be  generated  from  context-free 
grammars  are  called  context-free  languages. 


2.2.1  A Similarity  Measure  for  Syntactic  Patterns 

in  [13],  errors  in  a string  are  considered  to  be  the  three  types: 
substitution,  deletion  and  insertion  errors,  and  treated  as  syntax  errors 

it  ic 

by  defining  transformations  from  E to  a subset  of  E . 

& 

Definition  2.3.  for  two  strings,  x,  y e E , we  can  define  a 

* * / \ 

transformation  T:  E -*■  E such  that  y t T(x) . The  following  three 
transformations  are  introduced: 

(l)  substitution  error  transformation 


<!)ja<D2 


a)jbu)2,for  all  a,  b z E,  a f b. 


(2) 


deletion  error  transformation 
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(3)  Insertion  error  transformation 


<o j | ujjaujj.for  all  a e E 

•k 

where  ,io ^ e E 

Def Ini tlon  2.k.  The  distance  between  two  strings  x,  y e E*, 
d*"(x,y),  Is  defined  as  the  smallest  number  of  transformations  required 
to  derive  y from  x. 

Example  2.1.  Given  a sentence  x ■ cbabdbb  and  a sentence 
y = cbbabbdb,  then 
x ■ cbabdbb 


cbabbbb 


cbabbdb 


cbbabbdb 


The  minimum  number  of  transformations  required  to  transform  x to  y 
Is  three,  thus,  dL(x,y)  « 3. 

The  metric  defined  In  Definition  2-U  gives  exactly  the  Levenshtein 
distance  of  two  strings  [6|J.  A weighted  Levenshtein  distance  can  be 
defined  by  assigning  nonnegative  numbers  a , y and  6 to  transformations 
Tj,  Tp  and  T|  respectively.  Let  x,  y e E*  be  two  strings,  and  let  J be 
a sequence  of  transformations  used  to  derive  y from  x,  then  the  weighted 


Levenshtein  distance  between  x and  y,  denoted  as  d"(x,y)  Is 


dw(x,y)  ■ {0*kj+ymj+« *nj} 


(2.1) 


where  kj,  mj  and  nj  are  the  number  of  substitution,  deletion,  and  Insertion 
transformations  respectively  In  J. 

We  shall  propose  a weighted  metric  that  would  reflect  the  difference 
of  the  same  type  of  error  made  on  different  terminals.  Let  the  weights 
associated  with  error  transformations  on  terminal  a in  a string  where 
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* 

> 

■:*  > 


a e E,  o)j  and  to^  e I , be  defined  as  follows: 


(1)  o)jao>2  (■ 


T$,  S(a , b) 


to | ba>2  for  b e I,  b t a,  where  S(a,b)  Is  the 


cost  of  substituting  a for  b.  Let  S(a,a)  *=  0. 


(2)  o^ao^  f 


T„,  D(a) 


bi| o>2  where  D(a)  is  the  cost  of  deleting  a 


from  a>ja<i>2. 


(3)  o)jaa)2  f- 


T j , I (a,b) 


aijbato^  for  b e I,  where  l(a,b)  is  the  cost 


of  inserting  b in  front  of  a. 

We  further  define  the  weight  of  Inserting  a terminal  b at  the  end  of  a 

string  x to  be, 

T.,  I '(b) 

(l»)  x | xb,  for  b e Z. 

Let  x,  y e Z be  two  strings,  and  J be  a sequence  of  transformations  used 
to  derive  y from  x.  Let  Jjj  be  defined  as  the  sum  of  the  weights  associated 
with  transformations  in  J,  then  the  weighted  distance  between  x and  y, 
d^(x,y)  is  defined  as 


dW(x,y)  - mjn  { [ J [ > 


(2.2) 


Equation  (2.2)  can  be  illustrated  by  a graphical  interpretation. 

From  point  B to  point  E,  each  path  in  the  lattice  shown  in  Figure  2.1 
corresponds  to  a sequence  of  transformations  used  to  derive  y from  x.  A 
horizontal  branch  indicates  an  insertion  transformation,  a vertical  branch 
indicates  a deletion  transformation,  and  a diagonal  branch  indicates  a 
substitution  or  a non-error  transformation.  The  weight  assigned  to  a 


particular  type  of  transformation  on  a particular  symbol  in  x is  labeled 
at  its  corresponding  branch.  Let  J be  a path  in  the  lattice,  then  |j| 
is  the  sum  of  weight  associated  with  each  branch  in  J.  The  distance, 


W / 

d (x,y),  is  the  weight  associated  with  the  minumum-welght  path. 

We  shall  refer  to  the  Levenshtein  distance,  weighted  Levenshtein 
distance  and  weighted  distance  as  metric  L,  w,  and  W respectively,  and 
use  d(x,y)  to  denote  the  distance  between  x and  y based  on  any  of  the 
three  metrics. 


I 

t 

I 


i ' 


2.2.2  A Minimum- Distance  Error-Correcting  Parser 

Let  L(G)  be  a given  language  and  y be  a given  sentence  the  essence 
of  minimum-distance  error-correcting  parsing  is  to  search  for  a sentence 
x in  L(G)  that  satisfies  the  minimum  distance  criterion  as  follow 

dlx,y)  - min  (d(z,y) |zeL(G) } (2.3) 

z 

We  note  that  the  minimum-distance  correction  of  y is  y Itself  If 
y e L(G). 

We  shall  extend  the  minimum-distance  ECP  proposed  by  Aho  and  Peterson 
[13]  to  ail  three  types  of  metric;  L,  w,  and  W.  In  [13],  the  procedure 
for  constructing  an  ECP  starts  with  the  modification  of  a given  grammar 
G by  adding  the  three  types  of  error  transformations  in  the  form  of 
production  rules,  called  error  productions.  The  grammar  G is  now  expanded 
to  G1  such  that  L(G')  includes  not  only  L (G) , but  ail  possible  sentences 
with  the  three  types  of  errors.  The  parser  constructed  according  to  G' 
with  a provision  added  to  count  the  number  of  error  productions  used  In 
a derivation  is  the  error-correcting  parser  for  G.  For  a given  sentence 
y,  the  ECP  will  generate  a parse,  n,  which  consists  of  the  smallest 
number  of  error  productions.  A sentence  x In  L(G)  that  satisfies  the 
minimum-distance  criterion  (measured  by  using  Levenshtein  distance)  can 
be  generated  from  II  by  eliminating  error  productions.  With  some  modifi- 
cations, this  minimum-distance  ECP  can  easily  be  extended  to  the  three 


2k 


metrics  proposed  In  Section  2.2.1.  We  first  give  the  algorithm  of  con- 
structing an  expanded  grammar.  In  which  the  nonnegative  numbers  associated 
with  error-productions  are  the  weights  associated  with  their  corresponding 
error  transformations  with  respect  to  the  metric  used. 

Algorithm  2.1 . Construction  of  expanded  grammar 

Input:  A CFG  6 - (N,E,P,S) 

output:  A CFG  G‘ - (N1 , E' ,P' ,S ')  where  P'  Is  a set  of  weighted 

productions. 

Method: 

Step  1.  N'  - MU  {S'}U{Ea|a  e E}  , E'  D E 

Step  2.  If  A -*■  a.  b,a.b....b  a , m > 0 Is  a production  In  P such 
1 fi  u ■ i £ mm  — 

* 

that  Oj  e N and  bj  c E,  then  add  A E^  . ..E^  a^, 

12m 

0 to  P',  where  each  E.  Is  a new  non-terminal,  E.  e h" 

bl  D1 

and  0 Is  the  weight  associated  with  this  production. 

Step  3-  Add  the  following  productions  to  P'. 


weight 


Production  Rule 

L_ 

w 

W (metric) 

(a) 

S'  S 

0 

0 

0 

(b) 

S'  -*  Sa 

1 

6 

1 ' (a) 

for 

all  aeE' 

(c) 

Ea*a 

0 

0 

0 

for 

all  aeE 

(d) 

E -*•  b 

1 

a 

S(a,b) 

for 

all  aeE, 

a 

bcE 1 

1 , and  b t a 

(e) 

Ea*X 

1 

r 

D(a) 

for 

all  aeE 

(f) 

Ea  - bEa 

1 

6 

1 (a,b) 

for 

all  aeE,  bcE* 

» 


...  ...v. 
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In  Algorithm  2.1  the  production  rules  added  In  Step  3(b),  3(d), 

3(e)  and  3(f)  are  called  error  productions.  Each  error  production 
corresponds  to  one  type  of  error  transformation  on  a particular  syni>ol 
In  2.  Therefore,  the  distance  measured  In  terms  of  error  transformations 
can  be  measured  by  error  productions  used  In  a derivation.  The  parser  Is 
a modified  Earley's  parsing  algorithm  with  a provision  added  to  accumulate 
the  weights  associated  with  productions  used  In  a derivation.  The 
algorithm  Is  as  follow. 

Algorithm  2.2.  Minimum-distance  error-correcting  parsing  algorithm 
input:  An  expanded  grammar  G*  «=  (N*  ,£'  ,P‘ , S 1 ) and  an  input 
string  y « b^..^  In  V* 

output:  Iq,1| ... Im  the  parse  list  for  y,  and  d(x,y)  where  x is  the 
minimum-distance  correction  of  y. 

Method : 

Step  1 . Set  j = 0.  Then  add  [E  -*■  • S', 0,0]  to  k. 

Step  2.  If  [A  ■*  a • B$,i,E]  Is  In  lj,  and  B y,  n is  a production 

rule  In  P1  then  add  item  [B  • y,j,0]  to  k. 

Step  3 • If  [A  -*■  a *,i,E]  Is  In  k and  [B  -*■  B • Ay,k,E]  is  in  I j , 

and  if  no  Item  of  the  form  [B  3A , k , can  be  found  in 
lj,  then  add  an  Item  [B  -*■  BA  • Y>k*h+E+S)  to  lj  where  x,  is 
the  weight  associated  with  production  A a.  if 
[B  -v  BA  • y,W$]  Is  already  In  lj,  then  replace  <J>  by  n+E+C 
if  $ > n+E+c. 

Step  4.  If  j ■ m go  to  Step  6,  otherwise  j ■ j+1. 

Step  5-  for  each  item  In  ij_j  of  the  form  [A  -*■  a • bjB,i,El  add 
item  [A  -»■  abj  • B,i,El  to  lj,  go  to  Step  2. 


I 

J 


di 


Step  6.  If  I tern  [E  ■+■  S',0,E]  is  In  I . Then  d(x,y)  * 5,  where  x 
Is  the  minimum-distance  connection  of  y,  exit. 

In  Algorithm  2.2,  the  string  x,  which  Is  the  minimum-distance 
correction  of  y,  can  be  derived  from  the  parse  of  y by  eliminating  ail 
the  error  productions.  The  extraction  of  the  parse  of  y Is  the  same  as 
that  described  In  Earley's  algorithm. 

2.2.3  A Minimum-Distance  Classifier  for  Noisy  Patterns 

In  Section  2.2.1, three  metrics  are  proposed  as  similarity  measures 
between  two  strings.  We  shall  define  the  distance  between  a string  and 
a given  language  based  on  any  one  of  the  three  metrics  as  follows. 

Definition  2.5.  Let  y be  a sentence,  and  L(G)  be  a given  language, 
the  distance  between  L(G)  and  y,  dK(L(G),y),  where  K is  a given 
positive  integer.  Is; 

K I 

dK(L(G),y)  « mln{  J d(Zj ,y) |zj  e L(G))  (2.4) 

In  particular,  if  K - 1,  then 

dj(L(G),y)  - min(d(z,y) |z  e L(G)}  (2.5) 

Is  the  distance  between  y and  Its  minimum-distance  correction  in 
L(G). 

As  the  distance  between  a string  (a  syntactic  pattern)  and  a language 
(a  set  of  syntactic  patterns)  Is  defined,  a minimum-distance  decision  rule 
can  be  stated  as  follows:  suppose  that  there  are  two  classes  of  patterns, 
Cj  and  C2  characterized  by  grammar  Gj  and  G2  respectively.  For  a given 
syntactic  pattern  y with  unknown  classification. 
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c. 

decide  y e c'2  If  d(<(L(G1),y)  > dR(L (G2>  ,y) 


(2.6) 


A block  diagram  of  a minimum-distance  classification  system  Is  given  In 
Figure  2.2. 

For  a given  L(G)  and  a string  y *=  bjb2...bm>  the  MDECP  described  In 
Algorithm  2.2  also  generates  the  distance  dj(L(G),y).  For  the  computation 
of  d^(L(G),y),  K > 1,  Algorithm  2.2  needs  further  modification.  The 
following  algorl thm  wl 1 1 generate  parse  lists  In  which  each  Item  in  the 
list  lj  Is  of  the  form  [A  -*■  a • 3,  i , (n^,^ ) , . . . (ne,ire)]  where 

e <_  K.  Each  pair  (n^.ir^),  for  1 £ k <_  e,  n^  is  the  weight  associated  with 


a derivation  of  substring  b 


1+1 


.bj,  and  Is  the  corresponding  correction 


of  b......b,  from  this  derivation. 

i+l  j 

A1 gori thm  2.3.  Computation  of  dK(L(G),y) 

Input:  An  expanded  grammar  G*  « (N * ,P' , S ' ),  an  input  string 
y * bib2***bm  ln  5:1  ’ and  K,a  9'ven  Positive  Integer, 
output:  dK(L(G),y)  and  XjX2...xK  the  K nearest  strings  to  y in  L(G). 
Method: 

Step  I . Add  Item  [E  + • S',0,<J>]  to  1^.  Set  j * 0. 

Step  2.  If  [A  ■+  o • B6,  i , (npit^)  . . . (ne,ne)]  is  in  lj  and  B -*•  y 

Is  a production  rule  in  P',  then  add  item  [B  -*■  • "Y » j , «|>I  to  I 

Step  3.  If  [A  -*■  a • By, h,  (n  j ,ttj  ) . . . (ne>Tre)  ] is  In  lj  and 

[B  ■+  B •,l»(m1»T1)...(mf,Tf)]  Is  In  I j , 

(a)  If  item  of  the  form  lA  + aB  • y,h,  (1  j ,nj) . . -Ogi  ) ] 

cannot  be  found  in  I j , then  add  [A  ■+  a B.  y,h,  (1 j ) • . • 

(1  ,n  )]  to  I,,  where  each  pair  are  chosen  from  the  set  N. 

9 9 J 

N * ((5+np+mq*  |l  1.  P — e>  1 — *1  £ an<*  X ■ B if 


j 


er* 
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B -*•  0,  5 is  an  error  production,  or  X « t , otherwise}, 

such  that  i.  < 1-  <...<  t are  the  g smallest  left  hand 
I — Z — — g 

side  numbers  having  distinctive  right  hand  side  in  N. 
g = | N | • If  | N } < K or  g » K If  |N|  >.  K where  |N|  is  the 
number  of  elements  in  the  set  N. 

(b)  if  Item  of  the  form  [A  aB  • y,h, (l  j ,nj) . . . dgi  »n ' , ) ] 

Is  already  In  lj,  then  rearrange  the  set  of  pairs 
(ij,nj)...(lgi,Tlg,)  to  be  0,,n,)...d  ,n  J such  that 
T)j,n2--*hg  are  g distinctive  strings  and  i,  <_  i2  <_  ...  < lg 
are  g smallest  number  In  the  set  N'  * NU{(lj,nj), 
(l^np.-.d^rj'))  and  g - |N'|  if  |N'|  < K,  or  g - K if 
| N • | > K. 

Step  4.  Repeat  Step  3 until  no  new  Item  can  be  found.  Then  If  j-m, 
go  to  Step  6,  otherwise  j*j+l. 

Step  5.  For  each  I tern  [A  + abj  • 3, 1 , (n j ) . . . (ne,*e)  ] in  lj_,, 
add  [A  abj  • 6, 1 , (n j ,»j)  . . . (ne ] to  Ij.  Go  to 
Step  2. 

Step  6.  If  item  [E  S'  • ,0,  (n  j ,tTj  ) . . . (n^.ir^)  ] is  in  lm,  then 

dk(L(G),y)  - { In,. 

1*1 

The  pairs  ^ k <_  K in  the  items  of  the  parse  lists  are 

added  for  bookkeeping  of  the  K nearest  strings  L(G)  to  the  input  y. 

During  the  derivation  of  substring  b,+,...bj,  the  corresponding  corrected  string 
are  recorded  such  that  identical  strings  caused  by  ambiguity  of  the  grammar 
will  not  appear  in  the  final  set  of  the  K nearest  strings. 


* 


w- 


m 
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2.3.  Error-Correcting  Parsing  for  Context-Free  Programmed  Languages 

2. 3.1«  Context-Free  Programmed  Grammar 

Definition  2.6.  A context-free  programmed  grammar  (CFPG)  is  a 
5“tupie  G « (N,£,S,P,j)  where 

(1)  N is  a finite  set  of  non-terminals 

(2)  £ is  a finite  set  of  terminals 

(3)  S is  the  start  symbol  in  N 

(k)  P is  a finite  set  of  programmed  productions 

(5)  J is  a finite  set  of  production  tables 

Each  production  in  P consists  of  a label  reJ,  a core  production  of  the 
from  A -*•  a where  AeN,  a e (NU£)*,  and  a success  branch  field  and  a 
failure  branch  field  each  consisting  of  elements  from  J. 

A derivation  or  generation  In  G proceeds  as  follows:  the  first 
production  is  applied  to  the  start  symbol  S;  therefore.  If  production  r 
Is  applied  to  the  current  sentential  form  y to  rewrite  a nonterminal  A,  and 
if  y contains  at  least  one  occurrence  of  A,  then  the  leftmost  A is  rewritten 
by  the  core  of  production  r and  the  next  production  label  is  selected  from 
the  success  branch  field  of  r;  if  the  current  sentential  form  does  not 
contain  x,  then  the  core  of  production  r cannot  be  used  and  the  next 
production  label  is  selected  from  the  failure  branch  field  of  r;  if  the 
applicable  branch  field  is  empty,  the  derivation  halts. 

Example  2.2.  Consider  the  CFPG  G^  ■ (N,£,A,P,j)  where 
N - {A,B,C} , £ - {a,b,c},  J - <1 ,2,3*i»,5>  and  P: 


m 


TTWl  . 


label 


core 


F(W) 

* 


1 

2 

3 

A 

5 


A -*■  aBC 
B -*■  aBB 
C -*■  CC 
B -*■  b 
C -*•  c 


m. 

2,4 

3 

2,4 

4 

5 


L (Gp)  * {anbncn  | n >_  1} 

A syntax  analysis  procedure  for  CFPG's  has  been  proposed  by  Swain  and 
Fu  [48].  The  algorithm  can  be  explained  by  using  the  following  example. 


Example  2.3.  Figure  2.3  is  a schematic  diagram  of  the  analysis  of  the 
string  abc  with  respect  to  grammar  Gp.  The  notations  used  are 
explained  as  follows: 

(1)  For  any  nonterminal  AeN,  Aj(y)Aj  indicates  that  the  string  y was 
generated  from  the  nonterminal  A which  was  rewritten  as  y at  the 
i th  step. 

(2)  A downward  arrow  indicates  a generative  step.  An  upward  arrow 
indicates  backtracking.  Backtracking  is  occurred  when  a 
generated  sentential  form  is  incompatible  with  input  string. 

That  is,  the  parsing  detects  at  this  step  that  if  the  analysis 
continues  along  the  current  path  (or  derivation),  the  string 
generated  will  be  different  from  the  input. 

(3)  The  branch  labels  have  the  form  (4)  ■ t,  where  £ is  S (success) 

1 


or  F (failure);  subscript  k indicates  the  application  of  the  kth 
production;  4 indicates  the  selection  of  the  4th  branch  in  the 


Nj (aBC)  N, 


F^t)  - 5 I F4(2)  undefined 


f5(0  - ♦ 

Successful  analysis, 
report  and  backtrack 


Figure  2.3  Analysts  of  abc  with  respect  to  6 . 


r 


; i 


|t 
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E field,  which  Is  the  branch  to  the  production  labeled  t.  By 
definition,  £^(4)  ■ $ Indicates  the  termination  of  a path  down 
the  tree,  which  constitutes  a successful  parsing  if  the  current 
sentential  form  Is  identical  to  the  string  being  analyzed.  The 
analysis  of  the  string  is  complete  when  Is  undefined 

for  k = 1 and  some  4. 


2.3.2  Error-Correcting  Parsing  Algorithms 

Using  the  parsing  algorithm  for  CFPG  proposed  by  Swain  and  Fu,  a 
parsing  is  considered  to  be  a failure  when  all  the  paths  in  the  analysis 
procedure  terminates  with  an  sentential  form  that  is  incompatible  with  the 
input  string. 

Example  2.4.  Given  a sentence,  aabc,  the  parsing  with  respect  to 
Gp  Is  illustrated  in  Figure  2.4.  In  Figure  2.4,  the  analysis  is 
complete  without  generating  a successful  parse.  It  concludes  that 
aabc  i L(G) . 

Let  y be  an  input  string,  n be  a preset  positive  integer.  Suppose 
that  a parsing  is  allowed  to  continue  along  the  path  in  which  an  incompatible 
sentential  form  is  generated.  Backtracking  is  occurred  only  when  a 
generated  sentential  form  is  considered  to  have  the  potential  of  generating 
a string  with  at  least  a distance  n from  the  input  string.  Using  this 
method,  a syntactically  correct  sentence  x can  be  generated  such  that 
dL(x,y)  <_n.  A complete  parsing  procedure  startswith  n«=0.  If  the  analysis 
based  on  n fails,  then  n is  increased  by  one  and  the  analysis  procedure 
Is  repeated,  otherwise  parsing  is  completed  and  d^(x,y)  <=  n.  The  algorithm 
is  described  as  follows. 


7TV 


J 


3** 


t 


f 


S0(1)  - 1 


Aj (aBC)Aj 


S. (1 ) - 2 ^ 

1 y^1(2)-i» 

A1(aB2(aBB)B2C)A1 


s2(i ) - \ 

A1(aB2(aBB)B2C3(CC)C3)A1 


vSj(3)  undefined 


terminate  analysis, 
aabc  Is  rejected. 


AjtaB^b^OA, 


S3(l)  - 2 


//  S3(2)  - k 

A1(a(B2(aB1|(aBB)B1|B)B2C3(CC)C3)AJ 


A,  (aB2UBlf(b)B1<B)B2C3(CC)C3)A1 

k 

V*  -« 

1 

A1(aB2(aB1|(b)BijB5(b)B5)B2C3(CC)C3)A1 


Figure  2.k  The  syntax  analysis  of  aabc  with  respect  to  Gp. 


!'H.  . 


Algorithm  2.4.  MDECP  for  CFPG 

Input:  A CFPG  G^  ■ (N,E,S,P,J)  and  an'  Input  string  y ■ b^..^. 
output:  A string  x e L(Gp),  and  d^(x,y) 

Method : 

Step  1.  Set  n=0. 

Step  2.  Call  subroutine  PARSE (n) 

Step  3.  If  "Analysis  Fall",  n**n+l  go  to  Step  2. 

Step  k.  d^(x,y)  ■ n,  exit. 

Subroutine  PARSE{n) 

Index  and  array; 

as  explained  In  Example  2.3* 
f^  the  left  hand  side  of  the  kth  rule, 
rk  the  right  hand  side  of  the  kjth  rule, 

X generated  sentential  form, 

STACK( i , 1)  indicates  S or  F,  STACK(I,  2)  indicates  the  production 
rule  lable.  STACK{1,  3)  indicates  the  branch  in  S or  F field  of 
the  ijth  step  in  a derivation. 

operator: 

X = X(f^  * — ” rt)  denotes  that  the  left  most  nonterminal  f^  in  X is 
replaced  by  r In  the  i_th_  step. 

X - X(ft  rfc)  denotes  that  the  substring  r In  X is  replaced  by 
f where  r^  is  placed  in  X at  the  I Jth_  step. 

Method: 

Step  1 . Set  k=l , 1*0,  £»1 , 5«S  and  X^r^. 

Step  2.  If  ?k(^)  is  undefined  then  go  to  Step  5,  otherwise  let 
If  ?**F,  then  go  to  Step  A. 

Step  3-  If  ft  cannot  be  found  in  X,  then  E«*F,  £**1  and  t“Fk(£). 
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Step  A.  Let  1-1+!,  STACK(1,  1)-5,  STACK(1,  2)-k,  STACK(l,  3 )-*• 

Let  X - X(ft  rt).  Call  subroutine  C0MPAT(n,  X) . If 
"Compatible,"  and  If  X Is  a terminal  string,  then  "Analysis 
Success"  Exit,  otherwise  k-t,  £-1 , go  to  Step  2.  If 
"Incompatible"  go  to  Step  5- 

Step  5.  If  1-0  then  "Analysis  Fall",  exit.  Otherwise  X - X (f  t ^-+  rt) , 
C-STACK(I,  1)  k-STACK(l,  2),  *-STACK(i,  3)+l , 1-1-1  go  to 
Step  2. 

Subroutine  C0MPAT(n,  X) 
index  and  array; 

M - | X | , the  length  of  sentential  form  X.  Assume  that  X - 

0(1,  J),  stores  the  number  of  potential  errors  between  B ^ . . . B ^ and 

^1^2" * *bj • 

Method: 

Step  I . 0(0,  0)-0 

Step  2.  Do  1-1  to  M 

D ( I , 0)  - 0(1-1 , 0)+l  If  Bj  Is  a terminal 

D(I,  0)  - D(!-I,  0)  If  Bj  Is  a nonterminal 

Step  3.  Do  j-I  to  m 

0(0,  J)  - 0(0,  j-l)+I 
Step  A.  Do  i-1  to  M,  D j-I  to  m 

(a)  If  Bj  Is  a terminal  and  Bj  + bj  then  mj  - 0(1-1,  J-i)+l 
otherwise  mj  - D(j-1,  J-l). 

(b)  if  Bj  Is  a terminal  then  m2  - 0(l_l,  j)+l  otherwise 
mj  - 0(1-1 , J). 


I 


(c)  If  Bj  is  a terminal  then  m^  * 0(1,  J-l)+i  otherwise 


m3  - 0(1,  J-l). 


0(1,  j)-mln  (mj.mj.m^) 

Step  5»  If  0(H,  m)  > n then  It  Is  "Incompatible"  otherwise 


"Compatible".  Exit. 


Example  2.5.  Let  the  string  aabc  be  parsed  by  Algorithm  2. A.  The 


result  of  the  first  analysts  (n-0)  is  shown  In  Figure  2. A.  Since 


"Analysis  Fail"  is  reported,  the  algorithm  then  increases  n by  one. 


Figure  2.5  describes  the  result  of  the  second  analysis  where  an 


"Analysis  Success"  Is  reported.  The  algorithm  generates  the 
corrected  string  "abc",  d^(abc,  aabc)*l. 


2. A.  Error-Correcting  Parsing  on  a Stochastic  Model 


Basic  notations  and  definitions  of  stochastic  context-free  grammar 


(SCFG)  given  In  [62-63]  are  briefly  reviewed. 


Definition  2. 7.  A SCFG  Is  a ^-tuples  Gfi  ■ (N,Z,Ps,S)  where. 


N Is  a finite  set  of  non-terminal s , 


Z is  a finite  set  of  terminals, 


Ps  is  a finite  set  of  stochastic  productions,  each  of  which  is 


of  the  form: 


aU’  J * !’2 V 1 • !’2 1 


where  nj  is  the  number  of  productions  with  A|  at  left-hand  side, 
l Is  the  number  of  non-termlnais,  Aj  t N,  ctjj  e (NUZ)  , and  pjj 
is  the  probability  associated  with  this  production.  Furthermore, 


■ 

0 K Pi  I L * an<*  1 


' ' ,L” 


I 
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sQ0)  - 1 


AjUBC)*, 


S,0)  - 


kSj(3)  undefined 


- S.  (2)«4|i 

A1(.B2(.BB)B2dA,  terminate  analysts 


S2(l)  - 3 


AjlaBjtbjBjCjA, 


Aj  (*B2(aBB)B2Cj(CC)C j)A, 


V«>  - 5| 


// 


VO^y*)-*  Al(aB2(bJB2C3(c)C3)A| 


A,  UB2(aB^(aBB)B^B)B2C3(CC)C3)Af 

Aj  (aB2(aB||(b)  F^O)  " ♦ 
Cj)A 


✓A,(aB2 

B^BCjtCdC,)*, 

A1U»2(aB^(aBB)BlfB)B2C3(C5(CdC5C)C3AI 

F*(1)  - *• 


successful  analysis,  aabc  is 
reported  to  be  one  error. 


S (1)  - \ 

* j y 5 (2)  • 4 

/'  3 \\  A1UB2(aBv(b)BJ|B  (b)B  C3(CC)C3)A, 

A, (aB2(aB^(aB6(aBB)B6BJB^B> X 
BjCjtCjtCOCjCjCjjA, 


F^(l)  - 5 


Aj (aB2 (aB^ (aBfi lb) BgBjB^B) BjCj (C? led CjC) A, 

A,  UB2  (aB^ (b) B^Bj (b) B5C3 (C6(c) C^d Cj) A, 


s4u) 


S5(l)  - 5 


A,  (aB2(aB^(aB6{b)B6B7(b)B7)8^8JB2C3(C5(CC)C5C)A) 

A,(aB2(aB|((b)B1(B5(b)B5C3(C6(c)C6C7(c)C7)A1 


/ 
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Def In  1 tlon  2.8.  The  stochastic  context-free  language  (SCFL) 
generated  by  SCFG  Gg  is 


L(GS) 

j 


* 1 

{(x,  p (x) ) | x e Z , S x, 

k 

1,2,... ,k,  and  p(x)  - l p.} 

J-l  J 


where  k is  the  number  of  all  different  derivations  of  x from  S, 
and  pj  Is  the  probability  associated  with  the  jth  distinctive 
derivation  of  x. 

Definition  2.9.  A SCFL  is  consistent  If  £ p(x)  * 1. 

xeL 


2.4.1.  A Stochastic  Model  for  Syntax  Errors 

Following  the  notation  of  error  transformations,  the  deformation 
probabilities  associated  with  the  three  types  of  transformations,  namely, 
Ts,  T| , TD  are  defined  as  follows: 


(1)  u^ao^  [■ 


Ts.qs(b|a) 


Wjbii^,  where  q^(b|a)  Is  the  probability  of 


substituting  terminal  a by  b,  and  a ^ b, 
T, *q , (b |a) 


(2)  w1ao)2  [ 


aj^baa)^*  where  q | (b  | a)  is  the  probability  of 


inserting  terminal  b In  front  of  a, 

T ,q  (a) 

(3)  (Djao>2  I 0)^2 » where  qD (a)  Is  the  probability  of 

deleting  terminal  a from  a string. 


(4)  x J- 


T,,qj(a) 


xa,  where  qj(a)  Is  the  probability  of  inserting 


terminal  a at  the  end  of  a string. 


Let  q^(a|a)  be  the  probability  that  no  error  occurs  on  terminal  a, 
which  could  be  Interpreted  as  the  probability  of  the  non-error  trans- 
formation on  a.  Assume  that  for  each  terminal  symbol  there  can  be  at 


..... 


.f 


4o 

most  one  error  existing.  The  deformation  probabilities  on  this  single- 
error  model  is  consistent  if 


l q«.(b|a)  + q (a)  + £ q.(b|a)  **  I 


beE 


be! 


for  all  a e Z 


(2.7) 


Let  a e E be  a substring  the  probability  that  symbol  a Is  deformed 
to  a,  denoted  q(a|a),  Is  defined  as  follows: 

qp(a)  If  o - X 

q(a|a)  - 1 max{qs(bja) ,q( (b ja)qD(a)J  If  o ■ b (2.8) 

i q | (b f | a)  ...  qj(b£_,|a)  max[qs(b£|a) ,q j(b^|a)qD(a)] 

If  a - b?...b£,  l > I 

Note  that  a substitution  error  can  also  be  considered  as  an  insertion 
transformation  followed  by  a deletion  transformation.  The  consistency 
of  this  multiple-error  stochastic  model  defined  in  (2.8)  can  easily  be 
proved  from  (2.7).  Therefore,  we  have 


l * q(a|a)  - I 


(2.9) 


aeE 


The  proof  of  (2.9)  Is  given  In  Appendix  A. 

The  probability  of  Inserting  a,  o e E , at  the  end  of  a string  is 
defined  as 

1 — q ' j when  a ■ X 

(i  - q',)  q' ,(bj)  q',(b2)  ...  q',(b£) 

when  o ■ bj  ...  b^,  t >_  l 


q'(o)  - | 


(2.10) 


Ufi 


q * , (a) . 


where  q' , - J 
aeZ 


Furthermore, 


l * q'(a)  - (l-q'  ) f l (q * , ) * 1 * i 

aeZ  ' l i-0  1 J 


(2.  It ) 


It  Is  also  assumed  that  for  any  string  of  symbols  a^  a2,...,an  e Z, 

£ 

and  strings  , a2,...,an,  an+j  e Z we  have 


q(ala2*  * *an+|  I a | a2  * **an)  ” q (ctj  |a{ . .q  (an  |an)  q'(an+f)  (2.12) 


With  the  deformation  probability  defined  on  each  terminal  of  a 

string,  the  probability  of  deforming  string  x to  string  y,  q(y|x),  where 

x = a,a0...a  is  defined  as 
I 2 n 

q (y I x)  - max[  n q(oi |a.) ]q' (a*  ,)  (2.13) 

l j«I  J J n 1 

| ||  | 

where  a ...a  a',,  la. I >0,  Is  a partition  of  y Into  n+I  substrings,  and 
I n n+i  J~ 

I I r,  r Is  the  number  of  different  ways  of  partitioning  y into 
n+l  substrings. 

A graphical  Interpretation  of  (2.12)  Is  particularly  illustrative. 
Consider  an  example.  In  which  x - a|a2a3  an<*  V “ b j , whose  lattice 
is  shown  in  Figure  2.6.  In  this  lattice,  a horizontal  branch  indicates 
an  Insertion  transformation,  a vertical  branch  indicates  a deletion 
transformation,  and  a diagonal  branch  indicates  a substitution  transfor- 
mation or  a non-error  transformation. 

In  Figure  2.6,  each  traverse  from  point  B to  point  E represents 
one  way  of  deforming  x into  y.  The  heavy  line  Indicates  that  bj  is  an 
insertion  in  front  of  aj,  b2  is  a substitution  for  a^  a2  Is  deleted. 


or  non-error 

deletion 


Figure  2.6  The  stochastic  deformation  model 
described  by  a lattice 


•*3 


i 


bj  substitutes  a^,  and  b^  is  inserted  at  the  end.  This  deformation  is 
made  possible  by  partitioning  y Into  ajC^c^aj,,  where  aj  - b|b2* 
a2  * a3  " bj*  “ bj,*  We  have, 

q(a,a2a3aJj|a|a2a3) 

- q,(b|la,)  q$(bf|aJ)  * qD(a2)  * q$ (b3 la3)  * (l-q ' , )q ' , (b^) 


There  are  numerous  ways  of  deforming  x to  y,  the  deformation  probability 
q(y|x)  can  be  interpreted  as  the  probability  associated  with  the  most 
likely  path  from  point  B to  point  E. 

2. A. 2.  Max i mum- L 1 ke 1 i hood  Error-Correcting  Parser  (MLECP) 

Let  L(Gs)  be  a given  SCFL,  and  y be  a noisy  string,  y { L (Gs) . The 
proposed  maximum-likelihood  error -cor recti ng  parsing  algorithm  is  to 
search  for  a string  x,  x e L(Gs)  such  that 

q(y|x)  p (x)  - max  {q (y | z)  p(z) |z  e L (G&) > (2.IA) 

z 


where  p(z)  is  the  probability  associated  with  z in  L(Gs). 

Similar  to  Algorithm  2.1,  the  construction  of  expanded  grammars 
, based  on  a stochastic  deformation  model  is  given  as  foiiows: 

Algorithm  2.5.  Construction  of  stochastic  expanded  grammar, 
input:  A SCFG  Gg  = (N,Z,Ps>S) 

output:  G^  • (N',Z',P^,S')  the  stochastic  expanded  grammar. 
Method: 

Step  i.  N1  *NU  {S'}  U {Eg|a  e I}. 

Step  2.  Z'  D Z 

Step  3.  If  A -£->  ao,5iail)2a2  bmbm>  m — *s  3 production  in 
Ps  such  that  aj  is  in  N*,  and  bj  is  in  Z,  then  add  the 

I 


production  A — > anE.  a.E,  ...E.  a to  P ' , where  each 

0 bl  1 b2  bm  m S 

E.  is  a new  non-terminal,  E.  e N'. 
b,  b, 

Step  4.  Add  to  P 1 the  productions 


(a)  S' 


* S where  q!  - l q' . (a) , 


(b)  S'  q ' S'a  for 


aeE 1 

all  a e E*. 


Step  5«  For  all  a e X,  add  to  P 1 the  productions 
q«.  (a|a) 

(a)  Ea  * a 


(b)  E. 


qc(b|a) 


-*■  b 


for  all  b e E 1 , b + a , 


(c)  E q°  (■*?--»  X 


(d)  E. 


q. (b  I a) 


■>  b E for  all  b e E 1 . 
a 


Suppose  that  y is  an  error-deformed  string  of  x,  x - aja2...an.  By 

Pt 

using  productions  added  to  P 1 by  Step  3,  we  have  S ■ ■ y > X,  where 

P,  s 

X - E E ...E  , If  and  only  if  S ■ ► x,  where  p.  is  the  i th  derivation 

a,  a2  an  Gs  I 

of  x In  G . Applying  Step  4(a)  first  and  then  repeatedly  applying 

P, ' 

Step  4(b),  we  can  further  derive  S'  y y » X an+|  where  Pj'  - p,  q'(an+1). 


I 


n+l‘ 


Oj  for  all  1 I <_  n, 


qUj  |a I ) 

The  productions  In  Step  5 generate  E l , 

ai  s 

If  apa2 an+J  Is  a partition  of  y.  Step  5(a),  (b) , (c) , and  (d) 

correspond  to  non-error  transformation,  substitution  transformation, 
deletion  transformation,  and  insertion  transformation  which  allows 
multiple  Insertions,  respectively. 


Thus,  the  stochastic  language  generated  by  Gs  Is 
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where  r Is  the  nunber  of  distinctive  sequence  of  transformations  to  derive 
y from  x,  and  q * (y | x)  is  the  probability  associated  with  the  I th 
sequence,  1 <^1  r. 

The  consistency  of  L(G')  can  be  proved  from  equation  (2.9),  (2.11) 

s 

and  (2.12). 

It  is  proposed  to  use  a modified  Earley's  parser  on  Gg 1 to  implement 
the  searching  of  the  most  likely  correction  of  a noisy  input.  The 
algorithm  is  essentially  Earley's  Algorithm  with  a provision  added  to 
keep  accumulating  the  probabilities  associated  with  each  step  of 
derivations. 

Algorithm  2.6.  Maximum-Likelihood  Error-Correcting  Algorithm 

input:  A stochastic  expanded  grammar  G ' * (V',Z',P  ' ,S)  of  G , 

s s s 

and  string  y ■=  I ^ b^. . .bm  in  £'  . 

output:  Parse  lists  of  y. 

Method: 


Step  1.  Set  j * 0,  and  add  [E  • S ' , 0,  1 ] to  k. 

Step  2.  (a)  If  [A  -*■  a • B3,  i,  p]  is  in  k,  and  B — y |s  a 

production  in  Pg'  add  item  [B  • y,  j , i ] to  I j . 

(b)  If  (A  a • , i , p]  is  in  I . and  [B  -*■  B • Ay,  k,  q]  is 
in  lj,  and  If  no  item  of  the  form  [B  -*■  BA  • y,  k,  r]  can  be 
found  In  k , add  a new  item  [B  -*■  BA  • yj^tpq]  to  I j , 
where  t Is  the  probability  associated  with  A a In  Ps'. 

If  [B  -*■  BA  • y,  k,  r]  is  already  in  I j , then  replace  r by 
tpq  if  tpq  > r. 


Step  3. 
Step  4. 


46 

If  J ■ m,  go  to  Step  5.  Otherwise,  J ■ J+l . 

For  each  Item  In  lj_j  of  the  form  [A  -►  a • bj$,  I,  pj 
add  Item  [A  otbj  • 6,  I,  p]  to  I j,  go  to  Step  2. 

Step  5.  If  Item  [E  ■+  S'  •,  0,  p]  Is  In  l_,  exit. 

■*  ■ m 

The  extraction  of  the  most  likely  correction  of  y,  x,  that  satisfies 
equation  (2.14)  are  the  same  as  that  In  HDECP. 

From  Step  5,  the  probability,  p,  in  the  Item  [E  *►  S'  •,  0,  p]  of 
the  last  parse  list,  1^,  gives  the  value  of  q(y|x)p(x)  for  some  x e l(Gs) 
that  satisfies  equation  (2.14)  If  the  stochastic  grammar,  Gs  is 
unambiguous.  Since  whenever  a substring  has  more  than  one  derivation, 
the  algorithm  choosesthe  one  associated  with  the  largest  probability. 
Consequently,  the  derived  number,  p,  is  q(y|x)pj(x),  where  x is  the  most 
likely  correction  of  y and  Pj(x)  Is  the  probability  associated  with  the 
jth  distinctive  derivation  of  x with  respect  to  Gs  (refer  to  Definition  2.8). 
Therefore,  only  when  x has  one  derivation  can  p be  interpreted  as 
q(y |x)p(x) . 


2.4.3.  Bayes  Classification  of  Noisy  Patterns 

Assume  that  there  are  two  classes  of  syntactic  patterns,  Cj  and  C2. 
Let  x be  a pattern.  Suppose  that  the  probability  density  function  for 
x in  C | , p (x | C | ) and  the  a priori  probability  of  C|,  P(Cj),  where  I ■ I ,2, 
are  known.  Using  Bayes  rule,  the  aposteriori  probability  that x Is  in 
class  j i s 

p(x|C.)P(C.) 

Pic  .1*1  J — I 

p(x|C.)P(C.) 

I-l  1 1 


for  J - 1 ,2 


(2.15) 


Then,  the  max I mum- It ke I f hood  decision  rule  (or  Bayes  decision  rule)  Is, 

C 

decide  xe  If  P(C, |x)  < p(C,|x)  (2.16) 

C2 

Let  Cj,  C2  be  two  set  of  training  patterns  for  Cj  and  C2  respectively. 
Two  stochastic  grammars  Gj  - (Nj ,I1 ,Pj ,S j ) and  G2  “ (N2,£2,P2,S2)  are 
constructed  to  characterize  Cj  and  C2  respectively,  such  that  the 
probability  of  a sentence  In  L(G.)  yields  the  probability  of  Its  corres- 
ponding syntactic  pattern  In  Cj,  for  i ■ 1,2.  Let  the  probability  of  x 
in  L(G|)  be  denoted  as  p(x|G|),  where  x Is  a given  sentence.  p(x|G|)  can 
be  taken  as  p(x)C.)  In  equation  (2.15).  The  maxi mum- 1 ike 1 1 hood  decision 
rule  Is  then  applied  for  recognition,  provided  that  x is  in  L (G ^ ) U L(G2), 
[64].  We  shall  rewrite  equation  (2.15)  and  (2.16)  as  follow: 

C, 

decide  xe  !f  p(x|GI)P(C1)  > p(x|G2)P(C2)  (2.17) 

Note  that  p (x | G j ) « 0 if  x { L(Gj) 

The  purpose  of  this  section  is  to  provide  a recognition  rule  that 
minimize  error  rate  for  a given  string,  even  If  it  Is  not  In  any  of 
the  languages  under  consideration.  Let  the  deformation  probabilities 
of  terminals  In  Gj  and  G2  be  known.  Let  Gj 1 and  G2'  be  the  stochastic 
expanded  grammars  for  Gj  and  G2  respectively  according  to  the  deformation 
probabilities.  Given  a string  y,  we  shall  Interpret  the  term  q j (y | x) p (x) 
that  satisfies  equation  (2.14)  as  the  probability  that  y is  an  error 
deformed  string  of  L(Gj),  and  denote  It  as  q(y|Gjf)  where  qj("|")  denotes 
the  deformation  probability  of  terminals  in  Gj.  Then  equation  (2.14)  can 


be  rewriten  as 
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q(yjGj')  - max  {q( (y|z) P(z) 1 2 e L (G { ) > (2.18) 

z 


The  use  of  expanded  grammar,  Gj',  enlarges  the  probability  space  from 
L(Gj)  to  L(Gj')  in  which  the  deformation  probabilities  are  employed  to 
adjust  the  density  function  p(x|G|)  for  x e L (G j ) . Consequently,  the 
probability  density  function  defined  in  equation  (2.18)  for  sentences 
in  L(G.')  more  closely  yields  the  distribution  of  syntactic  patterns 
In  Cj.  Let  the  recognition  rule  be  as  follows: 

C 

decide  ye  J if  q , (y | G , 1 ) P (C , ) < q£ (y | G£ ■ ) P (C2>  (2.19) 

2.4.4.  An  illustrative  Example 

A chromosome  pattern  classification  problem  Is  used  as  an  example. 
Consider  that  there  are  four  different  types  of  chromosomes-submedlan, 
median,  accrocentric,  and  teiocen trie-denoted  as  C s>  CM,  C^,  and  Cy, 
respectively  [64,65],  The  typical  chromosome  patterns  of  each  of  the 
four  types  are  illustrated  In  Figure  2.8.  Let  the  SCFG's  for  C$,  CM, 

C^,  and  Cy  be  Gs,  G^,  G^,  and  Gy,  respectively.  Segmentation  errors 
due  to  noise  and  distortion  in  input  patterns  are  considered  as  insertion 
or  deletion  errors,  and,  primitive  extraction  errors  are  considered 
as  substitution  errors.  With  a set  of  training  samples,  we  can  estimate 
deformation  probabilities  based  on  the  segmentation  and  primitive 
extraction  errors  conmitted  by  the  syntactic  pattern  recognition  system. 

Then  the  MLECP's  are  constructed  from  the  estimated  deformation  probabilities 
and  SCFG's  for  each  type  of  chromosomes.  A set  of  samples,  generated 
from  the  stochastic  expanded  granmars  G^',  G^',  G^\  and  Gy'  (Algorithm  2.5), 
is  then  used  as  the  test  samples.  They  are  classified  by  a Bayes 
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Figure  2.7  Bayes  classification  system  fer  noisy  strings 
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acrocentric  telecentrlc 

cadbbbabbcbbabbbda  ebbbabbcbbabbb 
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classification  system  shown  in  Figure  2.7.  All  the  programs  In  this 
paper  are  written  in  Fortran  IV  on  a CDC  65OO  computer. 


(A)  Pattern  Grammars 

Let  the  apriori  probabilities  P(CS),  P(CM) , P(CA)  and  P(CT)  be  .5, 
.26,  .20,  and  .04,  respectively  [64].  The  pattern  grammars  G^,  G^f  GA, 
and  Gy  are  given  in  Table  2.1.  The  deformation  probabilities  of  each 
terminal  are  given  in  Table  2.2.  For  illustration  only,  we  assume  that 
the  deformation  probabilities  of  different  classes  are  the  same,  the 
probabilities  in  Table  2.1  and  Table  2.2  are  rather  arbitrarily  assigned. 


gc  - (ns,i:s,ps,s) 


gm  " (nm,lm,pm,s* 


N bNb 


R - *'55.  > bRb 
R aQF 


Nm  "=  {S,W,M,N,F> 
rM  - {a,b ,c,d) 


L *55  > bib 
L FQa 


- {S,W,U,M,N,R,L,Q,F> 


- {a  ,b  ,c  ,d) 
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°A  " <NA’ZA’PA’S) 


Gt  - (nt,it,pt.s) 


F bFb 

F bcb 


bFb 


Nt  - {S,M,N,F) 
Et  - {a,b,c,e} 


Na  - {S,U,W,Q,M,N,F} 

ZA  - {a,b,c,d> 

Table  2.1  Gs>  GM,  G^,  GT  for  submedian,  median, 

acrocentric,  and  telocentric,  respectively. 


qs(a|a)-.97  qs(b|a)-0.  qs(c[a)-0.  qs(d|a)-.008  qs(e|a)-0. 

qs(b}b)».963  qs(ajb)»0.  qs(c|b)-0.  qs(d|b)-.012  qs(e|b)-.003 

qs(c|c)«.965  qs(a|c)"0.  qs(b|c)-0.  qs(d|c)-.003  qs(e|c)».01 

qs(d|d)«.955  qs(a|d)».01  qs(b|d)-.0l  qs(c|d)-.003  qs(e|d)-0. 

qs(e|e)«.965  qs(a|e)-0.  qs(b|e)-.003  qs(c|e)-.01  qs(d|e)-0. 

qD(a)  -.01  qD(b)  -.01  qD(c)  -.01  qD(d)  ■•01  <ID  (e)  “*01 

ql  (J  | I)-. 0024  for  all  1,  j,  1,  J e {a,b,c,d,e> 
q j ( 1 ) -0 . for  all  1 e {a,b,c,d,e} 

Table  2.2  Probabilities  of  terminal  error  transformations. 


As  discussed  in  [13],  the  time  complexity  of  minimum-distance  error- 
correcting  parsing  is  O(N^) . in  a stochastic  model,  some  of  the 
probabilities  associated  with  items  in  parse  lists  are  so  small  that 
their  corresponding  derivations  hardly  ever  occur.  We  may  eliminate 
these  unnecessary  derivations  by  comparing  the  probability  of  an  item 
with  a lower  bound  before  it  is  added  to  the  parse  list. 

Let  [A  -*■  a • 8,  i , p]  be  an  item  in  i..  We  set  5 as  a lower  bound. 

Then  [A  -*•  a • 8,  i,  p]  is  allowed  to  be  added  to  i^  only  if  p >_  £.  A 

good  selection  of  5 should  depend  on  the  tolerance  of  error  density  of 

• * 

a string.  We  use  C(A)J  as  the  lower  bound  in  this  paper,  where  A,  C 
are  constants.  The  value  of  C should  be  less  than  the  smallest  deformation 
probability  to  permit  its  associated  error  to  exist.  The  value  of  A 
depending  on  stochastic  grammar  rules  should  be  small  enough  to  guarantee 
successful  parsing  of  noisy  strings. 

For  the  string  y - cbabdbabcbabdbab , Table  2.3  gives  the  approximate 
maximum  number  of  errors  allowed  In  a substring  of  given  length  when 
parsed  by  G^  with  C - .001,  A « .5. 

To  illustrate  the  amount  of  parsing  time  saved  by  applying  this 
technique,  we  parse  a string  both  by  G^  with  a preset  lower  bound  and 
without  a lower  bound.  Figure  2.9  shows  this  comparison  in  terms  of 
the  number  of  items  in  each  parse  list  when  string  cbabbdbbabcbabbdbbab 
is  parsed  by  G^'  with  the  lower  bound  A ■ .5,  C ■ .001.  Total  cpu  time 
is  5.1*  seconds  with  the  preset  lower  bound  and  3**.3  sec.  without. 


d 


length  of 
substring 


1 2 3^56789  10  11  12  13  14  15  16 


maximum  no.  of 
error  allowed 


1111122222  2 2 2 3 3 3 


Table  2.3  Maximum  number  of  errors  allowed  In  substrings 

of  cbabdbabcbabdbab  when  parsed  by  GM‘  with  lower 
bound  .001  (.5)-*"1.  " 


Figure  2.10  shows  a plot  of  parsing  time  versus  string  length  using 
the  MLECP  constructed  from  Gg‘  with  lower  bound  . 00 1 ( . 5) J '•  The  time 
complexity  with  the  present  lower  bound  appears  to  be  0 (n  ) , 2 <_  £ <_  3 and 
l varies  with  the  values  of  constants  A and  C. 

The  trade-off  of  this  speed-up  Is  the  generosity  of  MLECP.  From 
Table  2.3,  we  notice  that  no  string  with  consecutive  errors  can  be 
successfully  parsed  In  this  example,  but  from  GM',  there  Is  nearly 
2.5$  chance  of  consecutive  errors  for  a string  with  string  length  16. 
Therefore,  the  value  of  constant  A and  C should  be  carefully  chosen 
such  that  the  MLECP  can  meet  a good  compromise  between  Its  parsing 
time  and  error-correcting  capability. 

(C)  Classification  Result 

Thirty-two  test  samples  are  generated  from  Gg1,,  GM',  G^',  and  G^.' 
with  average  string  length  27,  of  which  20  are  erroneous.  The  32  samples 
are  then  tested  by  the  Bayes  classifier  with  lower  bound  set  at  .001 (.5)^  ^ • 
The  result  is  that,  among  32  samples,  29  are  correctly  classified.  The 
rest  of  the  three  samples  are  too  noisy  to  be  correctly  classified  due 


to  the  use  of  lower  bound  of  MLECP's.  On  the  average,  it  takes  32.6  sec. 
to  classify  a string  with  an  overall  accuracy  of  30.6%. 

2. A. 5.  Sequential  Classification  of  Noisy  Patterns 

In  the  previous  section,  we  have  designed  MLECP's  for  the  classification 
of  noisy  and  distorted  patterns.  As  it  was  shown,  even  If  the  lower 
bound  of  ECP  are  skillfully  selected,  the  use  of  MLECP's  is  still  too 
costly  to  be  practically  feasible.  Persoon  and  Fu  have  proposed  a 
sequential  parsing  scheme  for  strings  generated  from  SCFG's  [10],  By 
using  an  optimal  decision  rule  and  a subopt Imal  stopping  rule,  the 
parser  scanned  only  part  of  the  Input  string  when  a decision  is  made. 

A 

For  a given  specified  probability  of  error,  e,  the  sequential  parsing 
scheme  should  be  designed  to  minimize  the  average  number  of  terminal 
symbols  scanned. 

The  sequential  classification  algorithm  (SCA)  consists  of  a decision 
algorithm  and  a sequential  parsing  algorithm  (SPA)  which  Is  essentially 
an  extension  of  Earley's  parser.  Since  errors  are  randomly  distributed 
in  a string,  and  SCA  requires  only  part  of  the  string  to  be  parsed, 
chances  are  that  the  parsing  of  a noisy  string  by  SCA  would  be  successful 
before  an  error  has  ever  been  detected.  Otherwise,  a forced  decision  is 
made  based  on  the  Information  accumulated  in  the  provision  of  SPA. 

Let  y = ap  . .ajaj+j . . .an  be  a noisy  Input  string.  Suppose  that  we 
observed  a^.  ..aj  (j  <_  n) . The  sequential  parsing  algorithm  (SPA)  computes 
p(a1a2<  • |Cj ) which  Is  the  probability  that  a^.-.aj  Is  a string  in 
L(Gj),  and  p(aj. . .ajC» |C j)  which  Is  the  total  probability  of  strings  in 
L (G j ) with  a^.-.aj  as  their  prefix.  By  using  these  two  quantities,  a 
stopping  rule  tells  when  one  has  to  stop  observing  more  terminals,  and 


■?: - "• 

... 


a decision  rule  assigns  a class  to  y once  the  stopping  rule  Indicates  to 
stop. 

The  SCA  designed  for  processing  erroneous  strings  Is  similar  to 
the  algorithms  presented  In  I 10]  for  non-error-correcting  parsing.  A 
forced  decision  rule  is  added  to  the  algorithms  such  that  a decision  can 
be  reached  where  parsing  Is  terminated  by  Illegitimate  terminals.  The 
restriction  on  using  A productions  In  [10]  Is  removed.  The  algorithms 
are  given  In  Appendix  B. 

Ttoo  hundred  test  samples  are  generated  from  G^',  GM',  G^',  and  Gy'. 
Among  them,  112  are  erroneous.  The  average  string  length  Is  26.  The 

A 

classification  results  of  using  SCA  with  a preset  error  bound  e are 
Illustrated  In  Table  2.1*  which  shows  classification  accuracy  and  parsing 

A 

time  with  respect  to  various  e.  In  the  second  experiment,  the  200  test 
samples  are  passed  through  a non-sequentlal  Earley's  parser  constructed 
from  G^,  Gm,  G^  and  Gy.  The  result  Is  given  at  the  last  row  of  Table  2. 4. 
For  the  purpose  of  comparison,  the  third  experiment  uses  200  correct 
strings  generated  from  SCFG's  Gg,  G^,  G^,  and  Gy  as  test  samples.  They 
are  classified  by  using  SCA.  The  results  are  shown  in  Table  2.5.  Curves 
showing  accuracy  vs.  parsing  time  of  Experiment  I,  2,  and  3 are  given  in 
Figure  2.11. 

From  Tabie  2.5,  If  the  parsed  strings  are  error-free,  the  accuracy 
of  sequential  classification  increases  montonlcally  as  the  number  of 
symbols  scanned  Increases.  One  hundred  percent  accuracy  may  be  reached 
when  the  error  bound  Is  sufficiently  small.  Whereas,  the  curve  of 
accuracy  vs.  parsing  time  (or  the  number  of  parsed  symbols  per  string) 

Is  convex  when  the  parsed  strings  are  noisy.  In  this  case,  the 
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• Saquen 1 1 a I classification  of  noisy  strings 
0 Hon “sequential  classification  of  noisy  strings 
x Sequential  classification  of  unerring  strings 


Figure  2.11  Accuracy  vs.  parsing  time  of  sequential  classifier 
and  nonsequential  classifier 
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Sequential  Classifier 


A 

error  bound  e 

no.  of 

misclasslf 1 cat  Ion 

accuracy 

total  cpu 
time  (sec) 

ave.  no.  of  parsed 
symbol  per  string 

.0 

33 

83.5% 

162.5 

16 

.03 

26 

87% 

119.0 

8 

.05 

26 

87% 

118.5 

8 

1 

.1 

27 

86.5% 

116.5 

7 

3 

.15 

27 

86.5% 

113.4 

7 

.18 

30 

85% 

108.2 

6 

.21 

30 

85% 

104.8 

6 

.25 

32 

84* 

88.8 

5 

.3 

35 

82.5% 

82.7 

4 

.35 

57 

71.5% 

52.4 

1 

.4 

57 

71-5% 

49.1 

1 

.5 

93 

53.5% 

37.1 

1 

.6 

101 

49.5% 

.036 

0 

Non-sequential 

Classifier 

112 

44* 

27.0 

■ 


Table  2.4  Classification  results  from  200  strings 
(112  of  them  are  erroneous) 


Sequential  Classifier 


A 

error  bound  e 

no.  of 

misclasslf ication 

accuracy 

total  cpu 
time  (sec) 

ave.  no.  of  parsed 
symbol  per  string 

.1 

0 

100* 

128.2 

8 

.2 

11 

94.5% 

110.2 

7 

W 

V/1 

51 

74.5% 

50.0 

1 

Table  2.5  Classification  results  from  200  correct  strings. 

I 
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accuracy  of  the  sequential  classifier  cannot  reach  beyond  a maximum  point, 
in  our  example,  the  maximun  Is  87$  accuracy  when  error  bound  Is  .05 
with  average  parsing  time  .6  sec  per  string.  The  interpretation  is 
intuitive,  the  more  symbols  are  scanned,  the  higher  the  accuracy  SCA  can 
reach,  and  the  better  the  chance  an  error  is  detected..  As  soon  as  an 
error  is  announced  by  the  SPA  of  all  the  pattern  grammars,  a forced  decision, 
whose  low  accuracy  plays  the  counterpart,  must  be  made. 

To  increase  accuracy  beyond  the  maximum  accuracy  of  sequential 
classifier,  we  can  use  a sequential  error-correcting  parser  as  the 
classifier  which  is  a SCA  with  its  SPA  constructed  from  error-induced 
grammars.  The  results  of  classifying  200  noisy  strings  are  given  in 
Table  2.6.  When  error  bound  Is  .15,  the  sequential  error-correcting 
classifier  reaches  94$  accuracy  with  average  19.1  sec  per  string. 

Comparing  with  the  result  of  90.2$  accuracy  and  average  32.6  sec  per 
string  using  a non- sequential  error-correcting  classifier  in  Section  2.4.3, 
the  sequential  version  achieved  a slightly  higher  accuracy  with  less 
average  parsing  time. 

The  effectiveness  of  using  SCA  for  processing  noisy  strings  is 
largely  pattern  gramnars  dependent.  It  is  noted  that,  most  misclassif ications 
of  SCA  occur  between  median  and  submedian  because  no  distinctive  difference 
appears  at  the  first  few  symbols  of  their  string  representations.  Whereas, 
it  takes  only  two  or  three  symbols  to  discriminate  acrocentric  from  the 
other  three  types  with  very  high  accuracy.  We  may  conclude  that  to 
increase  both  accuracy  and  efficiency  using  SCA,  pattern  grammars  should 
be  carefully  constructed  such  that  informative  symbols  will  appear  at 
the  first  few  positions  of  derived  strings. 
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error  bound  e misclasslf Ication  accuracy 


total  cpu 
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ave.  no.  of  parsed 
symbol  per  string 


CHAPTER  3 


ERROR-CORRECTING  TREE  AUTOMATA 


3. 1 Introduct ion 

In  applying  syntactic  methods  to  pattern  recognition,  one-dimensional 
(string)  grammars  are  sometimes  Inefficient  in  describing  two-  or  three- 
dimensional  patterns.  For  the  purpose  of  effectively  describing  high- 
dimensional patterns,  high-dimensional  grarrmars  such  as  web  grammars,  graph 
grammars,  and  tree  grammars  have  been  proposed  [50-52].  Properties  of 
generalized  finite  automata,  called  tree  automata,  which  accept  finite 
trees  of  symbols  as  its  Input,  have  been  studied  by  several  authors 
[66-70].  Bra i nerd  [66]  proves  that  the  class  of  systems  which  generates 
exactly  the  sets  of  trees  accepted  by  the  automata  Is  a regular  system. 

Fu  and  Bhargava  [50]  introduced  the  application  of  the  tree  systems  into 
pattern  recognition.  In  practical  application,  tree  grammars  and  tree 
automata  have  been  used  In  the  classification  of  fingerprint  patterns 
T38],  the  analysis  of  bubble  chamber  events  [7^],  and  the  interpretation 
of  LANDSAT  data  [39],  etc. 

The  descriptive  power  of  tree  languages  and  the  efficient  analytical 
capability  of  tree  automata  made  the  tree  system  approach  to  pattern 
recognition  very  attractive.  This  chapter  is  concerned  with  the  error- 
correcting  version  of  tree  automata.  Unlike  the  string  case,  where  the 
only  relation  between  symbols  is  left-right  concatenation,  a tree  structure 
would  be  deformed  under  deletion  or  insertion  errors.  The  structure- 


(A 

preserved  error-correcting  tree  automaton  (SPECTA)  proposed  in  Section  3-3 
takes  oniy  substitution  errors  into  consideration.  By  introducing  a blank 
element,  a deletion  error  can  be  treated  as  substitution  of  a non-blank 
element  by  a blank  element,  and  an  insertion  error  becomes  a non-blank 
element  in  substitution  for  a blank  element.  An  example  of  using  SPECTA 
in  LANDSAT  data  interpretation  is  presented. 

In  Section  3. **,  syntax  errors  on  trees  are  defined  in  terms  of  five 
error  transformations;  namely,  substitution,  stretch,  branch,  split, 
and  deletion.  The  distance  between  two  trees  is  the  least  cost  sequence 
of  error  transformations  needed  to  transform  one  to  the  other.  Based  on 
this  tree  metric,  a generalized  error-correcting  tree  automaton  (GECTA) , 
is  formulated,  where  transformations  made  on  each  terminal  symbol  are  added 
to  the  system  in  the  form  of  transition  rules.  An  example  of  hand-printed 
character  recognition  is  given  to  demonstrate  the  operation  of  GECTA. 

3-2  Definitions 

in  this  section,  some  basic  definitions  on  trees,  tree  grammars,  and 
tree  automata  given  by  Brainerd  [66]  are  briefly  reviewed. 

Definition  3.1.  Let  N+  be  the  set  of  positive  integers.  Let  U 
be  the  free  monoid  generated  by  N+.  Let  • be  the  operation  and  0 
the  Identity  of  U.  The  depth  of  aeli  is  denoted  d(a)  and  defined 
as  follows:  d(0)=0,  d(a • l)*=d(a)+!  , ieN+.  a <_  b iff  there  exists 
xeU  such  that  a*x*=b.  a and  b are  Incomparable  iff  a £b  and  b £a. 
Definition  3.2.  D Is  a tree  domain  Iff  D Is  a finite  subset  of  U 
satisfying  (I)  beD  and  a < b implies  aeD,  and  (2)  a*jeD  and  i < j 
in  N+  lmpl ies  a*l eD. 
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Definition  3.3.  A rank  is  a pair  <I,r>  where  E Is  a finite  set  of 
symbols  and  r:E-*N.  Let  E^  ■ r '(n). 

Definition  3.4.  A tree  over  I (I.e.,  over  <E,r>)  Is  a function 

a:D-*-E,  such  that  D Is  a tree  domain  and  r[a(a)]  * max{  1 1 a * I cD} . 

i.e.,  the  rank  of  a label  at  a must  be  equal  to  the  number  of  branches 

in  the  tree  domain  at  a.  The  domain  of  a tree  is  denoted  D(a)  or  D . 

a 

Let  Tj.  be  the  set  of  all  trees  over  E.  The  depth  of  a is  defined  as, 
d(a)  * max{d(a) |acDa). 

Definition  3.5.  Let  a,  b,  b'eU  such  that  b * a-b',  then  b/a  ■ b'. 
b/a  is  not  defined  unless  a £ b. 

Def  I n 1 1 1 on  3.6.  Let  aeTj.,  and  aeDa>  o/a  ■ { |b,x)  | (a*b,x)ea}»  a/a  Is 

the  subtree  of  a at  a and  a/a  occurs  at  a in  a. 

Definition  3.7.  Let  acT^.,  aeU,  then  a*a  ■»  {(b,x)  | (b/a,x)ca}> 

Definition  3.8.  Let  acDa,  a,  BeTj.,  then  a(a«-B)  » { (b,x) cajbjka)  U a-B. 

This  Is  the  result  of  replacing  the  subtree  a/a  at  a by  the  tree  B. 

n 

Using  postfix  notation,  the  tree  a ■ U l*a.tH(0,x)}  is  represented 

i-1  ' 

by  a1a2...anx. 

Definition  3.9.  t is  a term  over  <E,r>  iff  t ■ xcE«  or  t » t,t,...t_x 
• ■ 1 ■ — u i z n 

where  xcl  , and  t,,  1 < 1 <n,  Isa  term,  T_  Is  the  set  of  terms  over 
n i — — i 

E.  There  is  obviously  a one-to-one  correspondence  between  the  terms 
over  E and  the  trees  over  E.  If  the  corresponding  tree  of  t Is  a 
subtree  of  a,  we  say  t Is  a term  In  a. 

Definition  3.10.  A regular  tree  grammar  over  <E,r>  Is  a regular  system 
Gt  * (V,r',P,$)  satisfying  the  following  conditions: 

(1)  <V,r'>  Is  a f Ini te-ranked  alphabet  with  E£V,  and  r'|E-r. 


( 


The  elements  In  V and  V-E  are  called  terminal  and  nonterminal 
symbols,  respectively. 
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(2)  P Is  a finite  set  of  productions  of  the  form  where 
<j>  and  4>  are  trees  over  <V,r'>. 

(3)  S C Ty  Is  a finite  set  of  start  symbols. 

Definition  3.11.  Derivation  a%  Is  In  G^  Iff  In  P such  that 

a/a  m <P  and  3 ■ a(a+i>) . a—>3  Is  a derivation  iff  there  exist 

a.a.  ...a  , m > 0 such  that  a ■ * 6 in  G . 

C 1 m — 0 1m  t 

Definition  3-12.  The  language  generated  by  G^  «*  (V,r',P,S)  over 
<E,r>  is  defined  as,  L(Gt)  ■ {aeTj.  | there  exist  xeS  such  that 

* i 
Xg=>  a}  . 

Definition  3.13«  A tree  grammar  Gt  ■ (V,r',P,S)  over  <E,r>  is 
expansive  iff  each  production  in  P is  of  the  form 
x 

Xq  j ^ or  Xq  x where  xc£. 

1 r (x) 

and  Xq,  are  nonterminal  symbols. 

Following  the  definition  given  by  Brainerd  [66],  a tree  automaton  Is 
a finite  automaton  with  many-to-one  state  transition  functions. 

Definition  3»1**»  Let  <E,r>  be  a rank  and  l ■ {o^  ,o^. . .o^} . A finite 
E-automaton  or  tree  automaton  over  I is  a system  Mt  = (Q,f  j . . . f ^ , R) 

(1)  Q is  a finite  set  of  states, 

(2)  for  each  i,  1 <_  i ^ k,  fj  is  a relation  on  Qr^°i^xQ, 

(3)  R C Q is  a set  of  final  states. 


where 
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If  each  fj,  I £ I k,  Is  a function  fji  Qr^aP  -*•  Q.  then  M Is 
deterministic;  otherwise  Is  nondetermlnlstlc. 

The  construction  procedure  of  tree  automaton  for  a regular  tree 
grammar  can  be  summarized  as  follows  [50]. 

Algorithm  3.1.  Construction  of  tree  automaton. 

Step  1 . To  obtain  an  expansive  tree  grammar  (V',r,P',S)  for  the 
given  tree  grammar  (V,r,P,S)  over  alphabet  Z. 

Step  2.  The  equivalent  nondetermlnlstlc  tree  automaton  Is 
Mt  - (V'-Z,  fr..fk,{S})  where  f , (X,. . .Xn)~XQ  If 
Xq  "*■  X | . . . Xrx j Is  in  P 1 . 

The  acceptance  of  a tree  by  tree  automaton  Is  a backward  procedure. 
It  reads  parallel  branches  simultaneously  then  transfers  to  the  states 
of  their  Immediate  predecessors. 


3.3  Structure-Preserved  Error-Correcting  Tree  Automaton  (SPECTA) 

Let  D be  a tree  domain,  Dell,  Z be  a set  of  terminal  symbols,  we 
define  Tj.D  ■ {a|a  e T^.,  Da  - D}  be  the  set  of  trees  In  the  tree  domain  D. 
In  this  section,  substitution  error  Is  described  in  terms  of  the  trans- 
formation S:  Tj.D-*-  TeD.  For  a e D,  x e Z , and  a,  a'  e T^,  we  write 
Sg/x 

a| a'  If  a'  Is  the  result  of  replacing  the  label  on  node  a of  tree 

k 

a by  terminal  symbol  x.  Furthermore,  S denotes  the  composition  of  S 
with  Itself  k times. 

The  distance  on  trees  In  Ty^,  d(a,a'),  Is  defined  as  the  smallest 

k L 

Integer  k for  which  a| a'  If  a and  a'  are  two  trees  In  T^,  for  some 

DC  II.  The  function  of  d Is  symmetric  and  satisfies  triangle  inequality. 
Let  L be  a tree  language,  and  tree  o'  4 L.  The  essence  of  SPECTA  is  to 
search  for  a tree  a,  a c L such  that 
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d(o,o')  - m'n  {d(B.ct')  |8  e L,  Dft  - D , } (3.1) 

p pa 

and  reconstruct  a*  as  a.  d(a,a')  In  the  above  equation  Is  also  defined  as 
the  distance  of  a'  from  L,  denoting  d(L,a').  a is  called  the  minimum 
distance  correction  of  a'  in  L. 

3.3.1  Min imunr  Pi  stance  SPECTA 

By  adding  terminal  error  production  rules  corresponding  to  sub- 
stitution error  transformations,  the  covering  grammar  G^,  = (V 1 , r * , P ' , S) 
of  a given  tree  grammar  G^  **  (V,r,P,S)  Is  constructed  as  follows: 

Step  i . V'  = (V-E)  U I1,  where  S'  D E is  a new  set  of  terminal 
symbo 1 s . 

Step  2.  For  each  y e E'  add  to  P1 

y x 

Xo  * /\  • ,fXo  * /\  SP 

X1  •••  Xr(x)  X1  Xr(x) 

or  Xq  y if  Xq  x is  in  P. 

The  language  generated  from  G^  consists  of  the  language  L (G^)  and 

» 

its  corresponding  erroneous  trees.  Hence,  L (G^)  can  be  written  as 
L(Gj.)  - {a1  |a'  e Tj.,,  and  3 a e L(G t)  such  that  Da,  •»  D^} 

For  a given  tree  grammar,  G^,  the  SPECTA  is  formulated  to  accept 
trees  in  L(G^)  and  to  generate  a parse  that  consists  of  the  minimum  number 
of  error  productions.  Assume  that  a'  is  an  Input  tree,  the  operation  of 
a SPECTA  Is  a backward  procedure  of  construction  a tree-like  transition 
table  from  the  frontiers  to  the  root  of  o'.  For  each  node  a e Da,, 


there 
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Is  a corresponding  set  of  triplets,  denoting  t In  the  transition  table. 

u 3 


Each  triplet  (X,£,k)  Is  added  to  t If  X Is  a candidate  state  of  node  a, 

3 


l Is  the  minimum  number  of  errors  In  subtree  a'/a  when  node  a Is  represented 
by  state  X,  and  k specifies  the  production  rule  used.  The  algorithm  Is 
given  as  follows. 

Algorithm  3*2.  Minimum-distance  SPECTA 
Input:  Gt  ■ (V,r,P,S)  and  tree  a1. 

Output:  Transition  table  of  a'  and  d(L(Gt)  ,<*'). 

Method: 

If  r [oc 1 (a) ] - 0 , a1  (a)  « x , then  add  to  t 

3 

(a)  (Xg,0,k)  if  Xq  -*•  x is  the  kth  rule  In  P. 

(b;  (Xg,l,k)  if  Xq  -*■  y Is  the  kth  rule  In  P and  y i*  x. 


Step  1 


Step  2.  If  r[a'(a)]  ■ n > 0 , a1 (a)  * x , then  add  to  t. 


(a)  (X_,£,k) , if  Xrt  -*■  x Is  the  kth  rule  in  P and 

0 o 

x/.  ..X 

1 n 


(X.,£.,k.)  e t , ...,(X  ,£  ,k  ) e t _ then  £-£,+...+£ 
Ill  a*  l»  ’ n n n a*n  I n 


(b)  (XQ,£,k)  , if  XQ  y Is  the  kth  rule  in  P,  y ^ x,  and 

x/..x 

I n 


(X|  »£|  *kj ) e ta*l/"/Xn,£n,kn^  e ^‘n  then  1 = A|  + ---+£n+I 
Step  3*  Whenever  more  than  one  item  In  tg  has  the  same  state,  delete 
the  Item  with  larger  number  of  errors. 

Step  If  (S,£,k)  e tg,  then  d(L(Gt),a')  “ £.  If  no  Item  is  In  tg 
of  the  form  (S,£,k),  then  no  tree  in  L (Gt ) Is  In  tree  domain 


D i,  the  Input  tree  is  rejected. 


The  minimum-distance  correction  of  a'  can  easily  be  traced  out  from 
the  transition  table. 


I 


J 
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The  tree  gramnar  in  the  following  example  Is  a part  of  the  highway 
grammar  used  In  Section  3-3*3-  In  the  meantime,  we  use  It  as  an  example 
here  to  illustrate  the  operation  of  SPECTA  and  to  demonstrate  the  highway 
patterns  recognition  procedure  that  will  be  discussed  In  Section  3 - 3 - 3 - 
Example  3.1-  Consider  a set  of  vertlcle  line  patterns  as  given  In 
Figure  3-1-  Assume  that  elements  in  the  A x *t  array  are  connected 
as  a tree  shown  In  Figure  3.2.  Thus,  each  pattern  has  its 
corresponding  tree  representations.  For  example,  pattern  (b)  In 
Figure  3.1  can  be  represented  by  the  tree  shown  in  Figure  3-3,  where 
nodes  labeled  by  symbol  "b"  represent  blank  or  nonhighway  elements 
"O  ",  and  nodes  labeled  by  "h"  represent  highway  elements  "S3". 

The  tree  grammar  that  generates  these  tree  representations  can  be 
written  as: 

GH  * (v»r*p»s)  over  <j:»r>  where 

V «=  {S,Aq,Aj  ,A2,A^,X0, 1 j , l2,  l^,^,b,h} 

E ® } 

$ ’ b * h 

r($)  = 1,  r (b)  = {0,1,3),  r (h)  = {0,1,3} 

* * * * 

P:  S " I (1),  | (2),  | (3),  | W 

A0  A1  A3  A/t 
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Ay* 


3 /l\ 

X0  A2  1 3 


(10) 


'r  I 


do 


I,  -*  h (17) 


l9* 


(12) 


'3*  ! 


v ./1\ 


x6  xo  xo 


(13) 

OM, 


(15),  b (16) 


Given  a noisy  pattern  shown  in  Figure  3-Ma),  let  a'  be  Its  tree 
representation,  a'  Is  given  In  Figure  3. Mb).  The  transition  table  of 
a'  resulted  from  using  the  minimum-distance  SPECTA  with  respect  to  grammar 
is  shown  in  Figure  3-5.  Since  (S,3,2)  is  in  tg,  a'  is  accepted  by  the 
SPECTA  and  the  number  of  errors  in  a1  is  3*  Let  the  minimum-distance 
correction  of  a1  be  called  a.  The  generation  of  a from  the  transition 
table  Is  illustrated  In  Figure  3-6. 


3.3*2  Max i mum- l ike l i hood  SPECTA 

When  the  probability  distribution  of  patterns  and  the  deformation 
probabilities  on  each  terminal  are  availabe,  error-correcting  parsing 
based  on  maxi  mum- 1 Ike 1 1 hood  criterion  may  provide  a better  recognition 
performance.  Definitions  of  stochastic  grammar,  terminal  deformation 
probabilities,  and  max I mum- t ike 1 i hood  criterion  have  been  Introduced 
In  Chapter  2.  A similar  approach  to  MLECP  Is  used  in  formulating  SPECTA 
based  on  max i mum- 1 ike 1 1 hood  criterion. 
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The  expansive  stochastic  tree  grammars  and  languages  defined  in 
172]  are  briefly  reviewed. 

Definition  3.15.  A stochastic  tree  grammar  Gg  = lV,r,P,S)  over 
£ is  expansive  if  and  oniy  if  each  rule  in  P is  of  the  form 


*0^  /\  or  Xq  -£-*■  x where  x e £ 

Xi  Xr(x) 

and  Xq,  Xj...Xr^xj  e V-£  are  nonterminals. 

Definition  3.16. 

p.  k 

L (G  ) * { (a,p(a) ) |a  e T_,  S i = 1...k,  p(a)  = £ p.} 

S L i=i  ' 

where  k is  the  number  of  a! i distinctly  different  derivation  of  a 
from  S,  and  Pj  is  the  probability  associated  with  the  1th  distinct 
derivation  of  a from  S. 

Assume  that  the  occurrence  of  substitution  error  on  a terminal  is 
independent  from  its  neighboring  terminals.  Fung  and  Fu  [3 * ] define  a 
substitution  error  made  on  strings  to  be  a stochastic  mapping  cr:  £ -*■  £ 
such  that  a(a)  = b,  if  a and  b e £,  with  probability  q(b|a)  and  furthermore. 


I q(b|a)  = 1 
be£ 


(3.2) 


The  same  definition  can  be  applied  to  model  a substitution  error  made  on 
trees.  Furthermore,  assume  that  t = tj...tnx  is  a term  over  <£,r>, 


a(t....t  x)  = a(t.)  ...  a(t  )o(x) 
in  l r. 


(3.3) 


If  two  trees  a = t,...t  x,  a'  = t,'...t  1 x'  are  both  in  T_  , the 
in  in  £ 

probability  of  a'  being  the  noisy  deformed  tree  of  a is 


(Aj , 8)  -*•  b 


Figure  3.6  The  generation  of  a,  the  correction  of  a' 
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q (a ' | ot)  *>q(t)'|t1)  ...  q ( tR ' | tR)  q(x' |x)  (3-1!) 

We  further  have 

£ P q(a‘  | ot)  « 1 for  any  a e T^,  DC  1)  (3.5) 

a'eTj. 


For  a given  stochastic  tree  grammar  G = (V,r,P,S)  over  <Z,r>,  when 


the  deformation  probabilities,  q (y | x) , are  known  for  all  x e T,  and  y e E' 
the  stochastic  expanded  grammar  is  Gg 1 = (V* ,r* ,P' ,S)  over  <£',r'>  where 
V'  = (V-l)  Z1  and  I1  D I is  the  set  of  terminal  symbols,  and  for  ail 


y e I1,  Xf 


y is  in  P1  If  X0  x Is  in  P,  or  XQ 


y 

/ \ 

Xr--Xr(x) 


is  in  P',  if  xr 


x is  in  P and  p1  = p q(y |x) 

/ \ 

1 


X—Xr(x) 


The  language  that  generated  from  G^  can  be  written  as: 


L(G')  = {(a1  ,p‘  (ot 1 7 ) | a ' e T ,,  p'(a')  = £ q(a'|a)p(a)}  (3-6) 


aeL(G  ) 

D , =D5 
a a 


Suppose  that  the  given  noisy  input  tree  a1  is  in  tree  domain  D,  the 
maxi  mum- 1 ikel i hood  decision  rule  in  this  case  is  to  choose  a tree  a in 
L (Gs ) of  domain  D,  i.e.,  a e L(Gg)  and  a e T^,  such  that 

qbf|o)p(a)  = m|X  {q(o'|3)p(3)|  6 e L(Gs>  T^}  (3>7) 

We  call  this  value,  q(cx' |a)p(a) , the  probability  of  o'  being  a noise 

deformed  tree  of  L(G  ) and  denote  it  as  q(a'|G  '). 

S 5 

The  structure-preserved  maxi  mum- 1 ikel i hood  SPECTA  is  given  as  follows 
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Algorithm  3«3»  Ma xl mum- 1 i kel i hood  SPECTA 


Input:  (1)  Stochastic  tree  grammar  Gg  *=  (V,r,P,S)  over  <Z,r>. 

(2)  Defon.iatlon  probabi  1 1 ties,  q (y  |x).  for  all  x e Z,  y e Z1. 

(3)  Input  tree  a' . 

Output:  Transition  table  of  a1  and  q (ot * | ) - 
Method: 

S tep  1 . If  r[a(a)]  ■ 0,  a(a)  = y then  add  to  tg,  (X^ , p 1 , k) , if 


Xq  **--»  x is  the  kth  rule  in  P and  p'  = p q(y|x). 

Step  2.  If  rla(a)]  «=  n,  n > 0,  a(a)  = y then  add  to  ta,(X0>p' ,k) , if 
X-  --P ■ x Is  the  kth  rule  in  P and  (X,  .p.1  ,k  ) 

1 n 

e ta#1,  •••  »(xn.Pn'  ,kn)  e tg>n  then  p'  = p|*. . .*p^*p*q  (y  |x)  . 
Step  3.  Whenever  more  than  one  item  in  t_  have  the  same  state, 

3 

delete  the  Item  associated  with  smaller  probability. 

Step  A.  If  (S,p',k)  e tg,  then  q(a|G0  “ p1.  I f no  item  in  tQ  is 
associated  with  the  start  state  S,  then  no  tree  in  L (G^J 
is  In  tree  domain  D^.  Input  tree  is  rejected. 

3.3*3  Application  of  SPECTA  to  LANDSAT  Data  Interpretation 

Recently,  syntactic  methods  have  been  used  to  analyze  and  interpret 

data  obtained  from  the  earth  resource  technology  satellite  (LANDSAT) 

[39.53J.  The  input  data  used  in  [39*53]  are  the  results  of  pointwise 

classification  [73].  Each  pixel  collected  by  LANDSAT  represents  a ground 

2 

area  of  approximately  60  x 70  m . According  to  spectral  and/or  temporal 
measurements  of  the  object,  a pixel  is  then  classified  into  classes  of 
water,  cloud,  downtown,  concrete,  or  grass,  etc.  Due  to  the  resolution 
size,  spectral  signals  of  smaller  objects  are  usually  composed  of 


L_ 
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reflectance  of  several  different  kinds  of  ground  cover.  For  instance, 
the  spectral  signal  of  a segment  of  a highway  actually  results  from  a com- 
bined reflectance  of  concrete  surface,  grass,  and  transportation  vehicles. 
Consequently,  the  variation  of  size  of  smaller  objects  and  their  surroundings 
changes  their  reflectances,  and  thus,  their  spectral  properties  from  point  to 
point.  This  uncertainty  causes  some  difficulty  in  setting  threshold  for 
classification  based  on  spectral  information  of  individual  points  only. 

One  example  of  the  resuits  of  pointwise  classified  highway  patterns  is 
given  in  Figure  3.7  which  covers  the  area  of  the  northern  part  of  Grand 
Rapids, Michigan . Each  symbol  "H"  represents  a pixel  that  is  classified  as 
a segment  of  highway.  Figure  3.8,  which  is  obtained  from  the  official 
highway  map,  indicates  the  major  divided  highways  of  the  same  area. 

As  illustrated  in  Figure  3*7,  the  inadequate  resolution  of  highways 
and  the  mass  of  scattered  concrete  and  grass-mixed  objects  other  than 
highways  result  in  discontinuity  of  highways  and  spurious  points  from 
pointwise  classification.  Syntactic  methods  have  the  advantage  of  using 
contextual  and  structural  information  contained  in  patterns  for  recognition 
purpose.  In  order  to  discriminate  highways  or  rivers  from  other  objects 
having  similar  spectrai  properties  a tree  system  approach  is  proposed  by 
Li  and  Fu  [39].  The  method  demonstrates  fairiy  good  results  in  recognizing 
rivers  among  watery  areas  and  some  modern  buildings  with  glass  waiis 
having  simiiar  reflecting  surfaces  as  water,  but  poor  in  analyzing  highway 
patterns.  Evidently,  highway  patterns  in  some  cases  are  too  noisy  to  be 
effectively  analyzed  by  a conventional  training  and  parsing  methods. 

In  this  experiment,  an  input  picture  is  processed  window  by  window, 
where  window  size  is  8 x 8 array  of  pixeis.  The  labels  on  single  pixels 
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are  used  as  primitives.  Thus,  we  have  two  kinds  of  primitives:  "h" 

represents  a highway  pixel  and  "b"  represents  a blank  or  nonhighway  pixel. 

Initially,  sample  patterns  of  vertical,  horizontal,  or  diagonal  line  are 

used  as  our  positive  samples  (or  desired  patterns).  Assume  that  only  a 

single  segment  of  highway  or  at  most  two  intersected  highways  that  appeared 

within  a window  is  considered.  Hence,  some  patterns  consisting  of  two 

such  sample  patterns  are  also  added  to  the  set  of  positive  samples. 

Similar  to  the  procedure  illustrated  in  Example  3.1,  an  array  of 

primitives  in  a window  are  represented  as  a tree.  Each  primitive  becomes 

a labeled  node  in  the  tree  representation.  We  fix  the  tree  domain  to  be 

61* 

Du,  and  allow  node  label  to  be  either  "h"  or  "b".  Totally  there  are  2 
H D 

H 

tree  representations  in  T^,  , where  £ = {b,h,$},  and  $ is  the  start 

terminal.  Hence,  the  set  of  all  possible  patterns  in  an  8 x 8 window 

dh 

and  the  set  of  all  labeled  trees  in  Tj,  are  one-to-one  correspondence . 

Figure  3-9  illustrates  the  correspondence  between  points  in  the  8x8  array 

and  nodes  in  the  tree  domain  D^.  The  set  of  positive  sample  patterns  can 

now  be  transformed  into  a set  of  positive  sample  trees  of  domain  Du. 

n 

The  construction  of  a pattern  grammar,  when  error-correcting  parser  is 
used  as  recognizer,  is  a little  different  from  the  regular  procedure 
[73].  With  some  presumed  patterns  as  positive  samples,  the  language 
generated  from  the  constructed  grammar  must  be  as  close  to  the  set  of 
positive  samples  as  possible.  Otherwise,  there  is  always  the  possibility 
that  the  noisy  pattern  finds  its  best  match  in  the  set  of  "unwanted 
sentences"  (sentences  in  L (G ) but  not  in  the  set  of  positive  samples). 

The  tree  grammar,  G^,  constructed  to  generate  positive  sample  trees 
is  given  in  Appendix  C.  Figure  C.l  in  Appendix  C illustrates  some 


(start  node) 
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Figure  3.9  Nodes  in  tree  domain  D and  their  corresponding 

H 

positions  in  the  8x8  array. 


84 


typical  patterns  whose  tree  representations  are  In  L(G^).  In  Figure  C.1, 
"Group  A"  denotes  the  group  of  patterns  that  are  generated  from  rules  of 
the  form 


S ^ , 1 * 1...8,  "Group  B"  are  from  S -*•  ^ 

I I 


etc. 


’i 


Note  that  the  tree  vrfiose  nodes  are  all  labeled  with  "b"  also  belongs 
to  L(G^).  Actually,  the  pattern  It  represented  Is  an  undesired  one,  l.e., 
a window  with  no  highway  passing  through.  It  Is  our  negative  sample 
tree,  denoting  as  X.  We  may  categorize  the  sets  of  trees  that  have  been 
Introduced  so  far  as  follows:  T_  Is  the  universe  we  are  working  on, 
where  £ * {$,h,b}  and  $ Is  the  start  terminal.  Tj.  fi  (LfG^)  " {X}) 

Is  the  positive  sample  set,  denoting  as  S+.  {X}  Is  the  negative  sample 


se 


t denoting  as  S [74].  Let  the  set  T n * (S+US  ) be  denoted  as  N, 


then  N ■=  Tj.  Hf)  . Apparently,  elements  In  N are  noisy  patterns, 

which  are  normally  unrecognizable  using  a conventional  parsing  scheme. 

The  Idea  of  using  error-correcting  tree  automaton  Is  to  measure  the 

distance  between  an  Input  pattern  and  patterns  In  S+US  . If  the  Input 

pattern  Is  In  N,  It  will  further  be  reconstructed  to  Its  best  matching 

pattern  In  terms  of  the  minimum  number  of  mislabeled  nodes.  In  S+,  or 

erased  If  Its  best  matching  pattern  Is  In  S . Since  S is  also  In  L(G^) , 

+ " 

we  may  measure  the  distance  of  the  Input  tree  with  S and  S at  the  same 


time. 


Assume  that  an  Input  picture  Is  scanned  column  by  column  from  left 
to  right.  After  an  8 x 8 array  of  primitives  has  been  processed  by  SPECTA, 
we  erase  the  top  left  5*5  array  of  pixels,  l.e.,  replace  "H"  points  by 
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blanks  of  the  original  pattern  and  superimpose  the  reconstructed  pattern 
on  the  original  window.  The  scanning  window  then  moves  down  five  rows  of 
pixels  so  that  the  pattern  appeared  In  the  window  Is  processed  by  the 
same  procedure.  After  eight  columns  of  pixels  have  been  scanned,  the 
process  Is  repeated  from  the  sixth  column.  By  doing  this,  the  error- 
correcting  scheme  would  be  furnished  with  contextual  information,  not  only 
within  the  window,  but  between  the  window  and  Its  surroundings.  Using  the 
pattern  shown  in  column  1^1-11*8,  row  1 - 1 8 of  Figure  3-7,  the  wlndow-by- 
wlndow  correcting  process  Is  Illustrated  In  Figure  3*10.  The  flowchart 
of  the  entire  recognition  procedure  Is  given  In  Figure  3.11- 

Thls  error-correcting  scheme  of  the  highway  recognition  problem  is 
programmed  in  Fortran  IV  on  CDC  6500  computer  and  tested  using  the  data 
shown  In  Figure  3-7-  The  result  is  shown  in  Figure  3.12.  There  are 
80  x 160  pixels  in  the  input  data.  The  cpu  time  for  processing  Is  150  sec. 

We  also  use  the  grammar  G^,  which  is  trained  from  Grand  Rapids  data 
to  analyze  some  other  noisy  data,  such  as  data  obtained  from  Lafayette, 
Indiana.  The  polntwlse  classified  data  of  Lafayette  is  shown  in  Figure  3.13 
which  contains  125  x 125  pixels.  The  result  of  the  error-correcting 
analysis  is  shown  in  Figure  3.1^*  The  cpu  time  used  Is  101  sec.  For 
comparison,  we  use  the  highway  map  shown  In  Figure  3.15  as  ground  truth. 

There  are  several  remarks  to  be  made  about  this  highway  recognition 
example,  (l)  Originally,  the  presumed  positive  samples  were  more  than  those 
generated  from  G^  in  Appendix  C.  Many  patterns  of  two  intersected  highways 
are  considered.  After  the  originally  constructed  grammar  was  tested  by 
Grand  Rapids  data,  we  further  removed  production  rules  that  were 
infrequently  (or  not)  accessed  and  reduce  the  highway  grammar  to  G^  in 
Appendix  C. 


Figure  3.13  Polntwisely  clas 
the  Lafayette  ar 


(2)  An  alternative  solution,  If  not  using  the  error-correcting  scheme, 
is  to  obtain  a large  set  of  positive  samples  and  negative  samples  from 
training  data,  and  to  construct  a grammar  so  that  ail  the  negative  samples 
are  excluded  from  the  language  and  all  the  positive  samples  are  included, 
it  is  difficult  to  construct  such  a grammar  when  patterns  are  irregular. 
Besides,  due  to  the  large  variation  of  input  data,  a slight  difference 
in  the  training  set  will  cause  some  patterns  to  be  rejected  during  parsing. 
(31  For  our  convenience,  we  use  fixed  tree  domain.  A deleted  segment 
;f  highway  is  taken  as  a substitution  of  a highway  primitive  by  a blank 
primitive.  A spurious  H point  is  considered  as  substitution  of  a blank 
primitive  by  a highway  primitive.  A generalized  ECTA  which  handles 
substitution  errors  as  well  as  deletion  and  insertion  errors  is  introduced 
in  the  following  section. 

3.4  Generalized  Error-Correcting  Tree  Automaton 
3.4.1  Distance  on  Trees 

in  Section  3.3,  since  only  substitution  transformations  are  considered, 
tree  distance  is  measurable  when  two  trees  are  in  the  same  domain.  Our 
purpose  in  this  section  Is  to  define  transformations  between  trees  in 
different  tree  domains  such  that  tree  distance  is  measurable  for  any  two 
given  trees.  Consequently,  the  transformations  we  defined  can  be 
applied  to  a tree  automaton. 

Errors  on  trees  are  defined  to  be  of  the  following  five  types: 

(1)  the  substitution  of  the  label  of  a node  by  another  terminal 
symbol , 

(2)  the  insertion  of  an  extraneous  labeled  node  between  a node  and 


its  immediate  predecessor. 


— 
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(3)  the  Insertion  of  an  extraneous  labeled  node  to  the  left  of  all 
the  immediate  successors  of  a node, 

(If)  the  Insertion  of  an  extraneous  labeled  node  to  the  right  of  a 
node, 

(5)  the  deletion  of  a node  of  rank  1 or  0. 


The  three  operations  of  insertions  In  rule  (2),  (3)  and  (4)  are 
named  as  stretch,  branch,  and  split,  respectively,  according  to  the 
relative  position  of  the  Inserted  node  to  the  original  tree.  Apparently, 
the  Inverse  operation  of  any  type  of  insertion  is  deletion,  and  the 
Inverse  of  deletion  operation  Is  one  of  the  three  types  of  insertion. 

We  define  five  transformations  S,  T,  P,  B,  D,  from  Tj.  to  the  subsets 
of  Tj.,  to  describe  substitution,  stretch,  split,  branch,  and  deletion 
errors  respectively.  Let  o be  a tree  over  <E,r>,  acU,  yc£,  and  c*I  « a, 
r[a(c)]  = n,  then  error  transformations  are  defined  as  follows: 

(1)  sa/y(a)  " a(a«-{(0,y)}U{l»a/a»f  | I <_  i <.r[a(a)]}), 

(2)  Ta/y(a)  = a(a-«-{(0,y)  )U{  1 *o/a}) , 

(3)  Ba/y(a)  **  a(a-K(0,a(a)) , ( I ,y)  )OC  I+I  *a/a*  i | I _<  I <_r[a(a)]}), 

(if)  Pa^,y(a)  = a(c»n+i  ■*-  a/c*n) . . . (c»  1+1  *■  a/c»i)U(a-*-  (0  ,y ) ) , 

(5)  ^a/ytaO  *=fo(c*l  a/c*  i+l) . . . (c*n~I  *■  a/c'n)U(c»n  *■  A) 

\ if  r[a(a)]  ■ 0 

^a(c  a/a ) If  r[a(a)]  = 1 

S,  T,  B,  P and  0 transformations  are  Illustrated  In  Figure  3-16  (a), 
(b),  (c) , (d)  and  (e) , respectively. 

We  write  a[ — — — B,  if  3 Is  in  A(a),  where  Ae{S,T,B,P,D),  and 


further  denote  that  aj- 


B for  k ^ 0,  If  B is  derived  from  a by  applying 


k transformat  Jons,  where  A denotes  the  composition  of  A with  itself  k 

times.  The  distance  on  trees  over  E,  d(a,0),  Is  defined  as  the  smallest 

Ak 

integer  k for  which  a| B,if  a and  0 are  two  trees  in  Tj,. 

For  example,  given  two  trees  a and  0,  where  a - {(0,$),  0,”),  (M,P), 
(2, ~) , (2*1, q)}.  0«{(O,$),  (l,V),  (1-1, P),  (1-2, q),  (2,q)},  then 
d(a,0)  » 3,  since  0 = 0^/  ^Pi.l/q  (s1/v(°)))  and  no  other  derivation 
of  0 from  o costs  transformations  less  than  three.  Trees  a and  0 are 
shown  in  Figure  3. 17- 

Let  L be  a tree  language,  a tree  0 not  in  L can  be  derived  from 
some  tree  in  L by  a sequence  of  error  transformations.  The  distance 
between  0 and  L is  defined  as, 


d(L,0)  = {d(a,B)|  a e L} 


(3-8) 


Example  3.2.  Assume  a directed  graph  of  labeled  verticesand 
unlabeled  branches  as  shown  in  Figure  3-18,  the  tree  grammar  con- 
structed to  generate  such  patterns  is  given  as  follows: 

G,  *=  (V.r.P.S) 

t / 

V = {S,A,B, C, D, E,H,$ , a ,b ,c ,d,e,h} 
r ($)  = { 1 } , r(a)  - (0,3>,  r(b)  = {0,3,4},  r(c}  ■ {0,2} 
r (d)  = {0,2,3),  r (e)  = {1,2},  r(h)  - {1,2} 


A Vi\ 

H D B 
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7/\\  . 

CHAE  ’ 


/l\  . 

CHE 


/l\  . 

H A E 


C"/\  . 

B D 


Vl\ 

A E C 


d 

/\  . 

E C 


/V  * 

A C 


E V\  , 

A C 
h 

H V\  . 

C D 


Suppose  that  o'  Is  the  given  distorted  pattern.  The  successive  graphs 
after  each  tree  error  transformation  Is  applied  are  Illustrated  In 
Figure  3 - 1 S - 

3.4.2  The  Formulation  of  GECTA 

For  any  given  a'  not  In  language  L,  the  generalized  error-correcting 
tree  automaton  (GECTA)  is  formulated  to  search  for  a in  L such  that  the 
distance  of  a from  a'  Is  the  smallest  among  all  the  sentences  In  L.  That 


d(a,a')  - d(B.a') 


(3.9) 


Note  that  the  condition  D.  - D . In  equation  (3.1)  is  removed  here. 

p U 

Before  Introducing  the  algorithms  for  GECTA,  a normal  form  of  trees 
and  tree  grammars,  called  the  binary  form.  Is  defined. 
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a 


Figure  3.19  The  sequence  of  transformation 
of  o'  from  a 


i 


V 
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(A)  Tree  Binary  Form 

A binary  tree  grammar  Is  defined  as  follows: 

Definition  3. 1 7.  A tree  grammar  Gb  - (Vb, rb,Pb ,S)  over  Is 

said  to  be  In  binary  form  If, 

(1)  A pseudo  symbol  * Is  In 

(2)  rb  - {0,1,2} 

(3)  Each  production  In  Pb  Is  In  one  of  the  following  forms: 

(a)  U,  - X,  *. 

(b)  U2  -*■  UjXj  * 

(c)  XQ  + U,X,  x 

(d)  XQ  Xj  x 

(e)  Xq  -»■  x 

where  , U2>  XQ,  X?  are  In  - Zb,  xe£b  - {*}. 

Let  a be  a tree  In  T^.,  and  * be  a pseudo  symbol  not  In  Z,  the 
conversion  of  tree  a Into  Its  binary  form  ct*  Is  given  In  the  following 
Algorithm. 

Algorithm  3. A.  Binary  Form  Conversion, 
input:  Tree  a In  T^,. 

Output:  a*,  binary  form  of  a. 

Method : Repeat  CONVERT  until  a Is  In  binary  form. 

<C0NVERT>:  (I)  If  tg  ■ x Is  a term  In  o,  then  tQ  Is  said  to  be 
In  binary  form. 

(2)  If  tj...tp  are  already  In  binary  form,  and 

tQ  - (tj...tn)  x Is  a term  In  a,  x c Zn  then  tQ 
Is  said  to  be  In  binary  form  If  n ■ I,  then 
tQ  - (t,)  x 


" *****  V " 


I 


\ 

V 

«• 

.1 

'?  \ 

f1  * 


I 
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* * 

If  n > 1,  then  define  . such  that 

n- 1 


t,  - u,)* 

* . * X 

t2  - (t,  t2)* 


t , - (t  , t , 
n I n~2  n~ l 

‘o  ■ (Vl  tn)x 


)* 


The  construction  of  the  binary  tree  granmar  Gb  - (vb»pb»rb»s)  from 
a given  tree  grainnar  «=  (V,r,P,S)  In  expansive  form  Is  as  follows: 

Let  Eb  * EliC*},  add  V to  V^.  Pb  Is  constructed  as  follows: 

Step  1 . For  each  production  In  P of  the  form  Xg  -*•  x,  xeEg,  add 
XQ  + x to  Pb  and  rb(x)  - 0, 

Step  2.  For  each  production  In  P of  the  form  XQ  -*•  X^x,  xeEj,  add 
Xg  -*•  XjX  to  Pb  and  rb(x)  - 1, 

Step  3-  For  each  production  In  P of  the  form  Xg  -►  X^.-X^x, 
xeEn>  n j>  2,  add, 

U, 


'01 


V 


U02  - U0,V 


U0n-1  "*■  U0n-2Xn-l* 
X0  - U0n-1 V 


to  Pb,  and  rb(*)  ■ (1,2),  rb(x)  - 2.  Add  (UgJ  1 < 1 £n-l} 


to  Vb. 


Let  the  binary  form  of  a tree  language  L (G. ) be  denoted  as  L.  (G  ) 

t b t 

and  G^  be  the  binary  form  of  grammar  G^,  then  ^(G^  * L(Gb). 

Example  3.3  Character  E 

A character  E Is  shown  In  Figure  3.21  (a).  Based  on  the  pattern 
primitives  defined  In  Figure  3.20,  the  tree  representation  a of 
E Is  shown  In  Figure  3*21  (b) . The  tree  grammar  G^  which  generates 
the  character  E of  different  sizes  is  as  follows: 

G^  •*  (V,r,P,S)  over  <E,r> 

V - {S,B,C,D,$,b,d},E  * <M.d} 

r($)  - 2,  r(b)  ■*  {1,2},  r(d)  - {0,1} 


s*A 

B D 


BV\  . I 

CD  B 

°*  t . * 


The  binary  from  of  a,  a , Is  given  In  Figure  3-22  and  the  binary  , 
form  of  G^,  Gb>  Is: 


Gb  " (Wpb’s)  over  ^b'V 

Vb  = {S,Us,B,UB,D,c4,b,d,*},  Eb-lU  {*} 

rb($)  - {2},  rfa(b)  » {1,2},  rb(d)  » {0,1},  rb(*)  « {!}. 


Figure  3-20  Pattern  primitives 


* d 

F >d 

*d 

(a)E 

Figure  3* 


Character  E (a)  and  its  tree 
representation  (b) 


(B)  GECTA 

Let  * (Vb"IVF,{S})  *3e  a tree  automaton  t^iat  accepts  a tree  grammar 
In  binary  form  G^  * (yb,rb’Fb'^  over  where  F Is  the  set  of  transition 
functions  on  ’ For  a"  x E Eb’  By  add,n9  error 

transitions  according  to  the  transformations  defined  In  Section  3-^.1, 
the  expanded  tree  automaton  that  accept  all  the  possible  erroneous  trees 
Is  constructed  as  follows: 

Algorithm  3.5.  Expanded  Tree  Automaton 

Step  1.  Mb  - (V,F,{S))  where  V - (VV U { 1 F " FUF^JF^F1,  and 


i 

S D I 

F , F , and  F are  called  substitution,  deletion,  and 
Insertion  error  transitions,  respectively. 
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Step  2.  FS  - (fy(?)  * XQ,  o|Fx(0  ^ XQ  e F,  x * *,  y e Efa  - {x}} 

Step  3.  F°  - {f  (c)  * XQ,  y|Fx(5)  ^X0£F,xM}U 
(f(5)  * XQ,  o|F*U)  M(0eF} 

Step  4. 

(a)  Add  fy(Xg)  ^ Xq,  6,  to  F*  for  all  y e Eb  - {*}  If 
fx(€)  % Xq  Is  In  F and  x 4 *. 

(b)  Add  fx(l •X1)  * Xq,  0,  and  fy(l,X,)  * XQ,  a or 

fxO)  ^ x0’  °»  and  ^ X0*  ° to  F'  W ^ X0  °r 

f ^ X.  is  in  F,  X e r,  . 
x 0 b 

(c)  Add  f*U)  * Xq,  0,  f*(X0,l)  * Xq,  0,  fy  (Xq,  I)  * Xq,  0 
and  ^x(X0,l)  * Xq,  0 to  F1  If  fx(?)  o,  XQ  Is  In  F for 
all  x e Eb. 

(d)  Add  to  Ff,  fy(|)  % l»  and  fy(l,l)  I,  6,  for 
all  y e Eb  and  f *v  I,  6,  for  all  y e Eb  - {*} 

Where  5 e ^ * ' “ 1»2* 

The  notations  o,  y,  and  6 associated  with  transitions  represent  costs 
of  substitution,  deletion,  and  insertion  errors,  respectively.  Hence, 
weighted  distance  can  be  measured  by  using  expanded  tree  automaton. 

(a),  (b) , and  (c)  in  Step  k introduce  stretch,  branch,  and  split 
operations,  respectively. 

The  search  algorithm  for  the  least  cost  (minimum-distance)  solution 
is  to  construct  a tree-like  transition  table  with  all  candidate  states  and 
their  corresponding  costs  recorded.  Each  element  in  the  transition  table 
corresponds  to  a node  in  the  tree  domain  of  the  Input  tree.  Let  It  be  denoted 
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as  tg  If  a c D^,,  and  a'  be  the  Input  tree,  then  a pair  (X,n)  Is  In  t 
If  X Is  a candidate  state  representing  subtree  a'/a,  n Is  the  minimum 
cost  associated  with  X,  the  algorithm  Is  given  as  follows: 

Algorithm  3.6.  Minimum-Distance  GECTA 

Input:  Expanded  Tree  Automaton  ■ (V , F, (S>)  of  a tree  language 
L(Gt),  and  Input  tree  a'. 

Output:  d(L(Gt),a'),  and  transition  table  of  a'. 

Method: 

<SUBTREE (Start) > 

(a)  SUBTREE(Start)  - {(Xq,y)  | f ^ XQ,  v e FD). 

(b)  Add  (XQ,n)  to  SUBTREE (Start)  If  f(X,)  * XQ, 

Y e F^,  (Xpir)  e SUBTREE(Start) , then  n E Y + if* 

(c)  Add  (XQ,n)  to  SUBTREE  (Start)  If  fttj.X,,)  * XQ> 

Y £ F°,  (Xpir,)  and  (X2>ir2>  e SUBTREE(Start) , 

then  n “ Y + + *2* 

(d)  Whenever  two  or  more  Items  associated  with  the  same 
state.  Delete  Items  with  higher  costs. 

<SUBTREE(X,6)> 

(a)  SUBTREES, B)  - {(XQ,n)  I f(X)  * XQ,  y e FD,  n - Y + 6} 

U((x,e)}. 

(b)  Add  (XQ,n)  to  SUBTREE (X,e)  if  f(Y)  «v  XQ,  y e F°, 

(Y,ir)  e SUBTREE (X, 6) , n ■ Y + 

(c)  Add  (X0,n)  to  SUBTREE (X, 6)  if  f(YrY2)  * XQ,  y e FD, 
(Ypir,)  e SUBTREE  (Start)  and  (Y2,*2)  e SUBTREE(X.e) , 

or  (Ypir,)  e SUBTREE(X,6)  and  (Y2,tt2)  e SUBTREE(Start) , 
then  ti  ■ y + ifj  + ir2. 
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(d)  Delete  redundant  states. 

Step  1 . If  r[a(a)]  « 0,  a(a)  - x,  then 

(a)  (X_,n)  Is  In  t If  f (X)  ^X.,  UFUFSUf'(  for 

U 3 X w 

all  (X,ir)  In  SUBTREE (Start) , n = X + ir. 

(b)  (Xg,n)  Is  added  to  tg  If  f^CXj^)  ^ Xq,  X e F U FS 
UF1,  (X ] , tr j ) and  (X^)  e SUBTREE(Start) , 

n = x + iTj  + ir2. 

(c)  Delete  redundant  states. 

Step  2.  If  r[a(a)]  = 1,  a(a)  = x,  then 

(a)  (XQ,n)  Is  in  ta  If  fx(Y)  * XQ,  X e F U FS  U F1 , for 
all  (Y,ir)  e SUBTREE  (X,e)  and  (X,8)  e t ,,  n **  X + it. 

(b)  (X0>n)  is  added  to  tg  if  fx(V,  ,Y2)  * XQ,  X e FU  FS 
UF1,  for  all  (Y^Wj)  e SUBTREE(X,e)  and  (Y^t^) 

e SUBTREE  (Start)  or  (Yj,itj)  e SUBTREE  (Start)  and 
(Y2>tt2)  e SUBTREE(X.e)  where  (X,e)  e ta#],  then 
n ■ x + itj  + tt2. 

(c)  Delete  redundant  states. 

Step  3.  If  r [a (a)  ] = 2,  a(a)  *=  x,  then 

(a)  (XQ,n)  Is  in  tfl  if  fx(Y,,Y2)  * XQf  XeFU^UF1 
for  all  (Yj.itj)  e SUBTREE  (X,  ,6, ) and  (Y^t^)  e 
SUBTREE (X2,e2)  where  (Xj.Bj)  e ta>1  and  (x2 ,02)  e ta.2» 
then  n ■ X + tt^  + ir2. 

(b)  When  x i*  *,  add  (Xg,n)  to  tg  If  f*(Yj,Y2)  2, 

0 e F U F1  and  fx(2|*22)  ~ *0»  X e F (J  FS,  (Zj.iTj) 
e SUBTREE (Z, 0 ) , (Z^^)  e SUBTREE(Start) , then 
n * x + itj  + tt2* 


(c)  Delete  Redundant  states. 

Step  fr.  If  (S,n)  Is  In  1 0,  then  d(L(G  ^a')  -n.  exit 

For  all  states  In  M^,  a table  of  minimal  deleted  trees  Is  first  com- 
puted by  the  two  subroutine  SUBTREE(Start)  and  SUBTREE(X,0) . (Xg,n)  Is 

generated  from  subroutine  SUBTREE  (Start),  If  Xft  \|»,  and  ^ Is  the 

u ub 

smallest  subtree  among  all  the  possible  derivations  starting  from  non- 
terminal Xq.  Similarly,  (Xg,n)  is  generated  from  subroutine  SUBTREE(X,0) , 

if  Xn  ~y  \p,  where  X is  the  state  of  a frontier  In  <J»,  and  all  the  other 
0 Gb 

nodes  In  ifr  are  labeled  by  terminal  symbols.  Furthermore,  <Ji  has  the  least 

number  of  nodes  among  all  the  possible  derivations.  Steps  (l),  (2),  and 

(3)  are  formulated  under  the  assumption  that:  (i)  the  trees  represented 

by  the  states  of  the  immediate  successor  of  the  current  node  may  be  the 

subtree  of  the  acutal  derivation,  (ii)  there  are  one  or  more  subtrees 

that  may  be  deleted  between  the  current  node  and  the  node  right  adjacent 

x 

to  it.  For  example.  If  rule  Xn  ■+•  | Is  applied  at  t , but  (X,6)  e t . , 

v y a 9*1 

then  (Y,ir)  must  be  In  SUBTREE(X,0) . The  other  words,  let  0 be  the  tree 
represented  by  state  X and  abe  the  tree  represented  by  state  Y,  then  0 
is  a subtree  of  a,  where  nodes  in  a but  not  in  0,  are  assumed  to  be 
deleted,  the  cost  of  deletion  is  ir-0  which  Is  the  minimum  among  all  the 
possible  deletions.  All  the  Insertions  are  handled  by  transitions  in 
F*  automatically. 

Example  l.1*.  Recognition  of  Hand-Written  Character  E 

Suppose  a casually-written  character  E,  as  shown  In  Figure 
3.23(a),  whose  binary  tree  representation  a'*  is  shown  In  Figure 
3.23(b),  is  to  be  recognized.  The  expanded  tree  automaton  con- 
structed from  the  tree  grammar  in  Appendix  D-l  is  as  follows: 


y e {a,d,h} 
y e {a,b,h} 
y e {a,d,h} 


(V,F,{S}) 

{S,Us,B,Ub,D,C,I},  5 - {a,b,d,h,$,*>. 


fj  (US,D)  ^ S,  0 
f*  (B)  * Us,  0 
fb  (Ub,D)  ^ B,  0 
f*  (0  * uB,  0 
fd  * D,  0 
fb  (D)  ^ C,  0 

fy  (Ub,D)  ^ B,  o 
f ~ D,  o 
f (D)  % C,  a 

f (B)  * U$,  0 

f iub,d)  a,Y 

f (c)  * uB,  o 

f 'V'  D,  Y 
f (D)  A,  C,  Y 

fy  (B)  * B,  6 
fy  (D)  D,  5 
fy  (C)  * C,  6 

f*  ( I »B)  * U$,  0 
f*  (l,c)  * Ug,  0 
fb  (()  ^ 0,  0 
(|)  ^ D,  a 


y e {a,b,d,h} 


y e {a,d,h) 


fb  ( 1 , D)  'V'  C,  0 
f (l,D)  ^ C,  a 
f*  (U$,D)  •».  S,  0 
f*  (S,l)  ^ S,  0 
(S,l)  a,  S,  0 
f*  (Us,l)  * Us,  0 
f*  (UB,0)  ^ B,  0 
f*  (B,l)  ^ B,  0 
fb  (B , I ) a B,  0 
f (B , I ) * B,  a 
f*  (UB,I)  * UB,  0 
f*  (D)  ^ C,  0 


f*  (D)  ^ C,  0 
f*  (C,l)  <v  C,  0 
fb  (C,l)  * C,  0 
f (c,l)  a C,  a 
f (I)  * I.  « 


f a 1 , 6 


y e {a,d,h} 


y e {a,d,h} 


y e {a,d,h} 
y e {a,b,d,h,*} 


/ 

fy(l,l)  ~ I,  5 y e {a,b,d,h,*} 


y e {a,b,d,h} 


/\ 

* a 

I /\ 

h * d 


Figure  3.23  Distorted  Character  E (a)  and  Its  Binary  Tree 
Representation  a1*  (b). 
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Using  the  GECTA,  the  transition  table  of  o'*  Is  Illustrated  In 
Figure  3.2k  (a)  and  (b).  From  Figure  3-24  (b) , since  (S,  o + 6 + y)  Is 
In  tQ,  we  have  d(L(GE),a')  ■ a + <S  + y. 

3.k.3  An  Illustrative  Example  on  Character  Recognition 

A classic  example,  the  hand-printed  chracter  recognition  problem.  Is 
given  In  this  section  to  illustrate  the  operation  of  GECTA.  Input 
characters  are  assumed  to  be  digitized  patterns  in  a 16  x 16  format. 

After  chain  coding  [77]  and  primitive  extracting,  input  patterns  are 
transformed  Into  their  corresponding  tree  representations.  Grammars  for 
sample  characters  are  given;  hence,  GECTA's  are  constructed.  Input 
patterns  are  then  analyzed  by  the  GECTA's  and  classified  based  on  the  minimum 
distance  criterion. 

An  example  of  Input  pattern  is  shown  In  Figure  3-25  (a).  Assume 
the  leftmost  of  the  top  row  to  be  the  starting  point.  From  the  starting 
point,  the  Input  pattern  Is  chain  coded  point  by  point.  The  successor 
point  Is  coded  as  A,  B,...,  or  H according  to  Its  relative  position  to 
the  current  node.  The  resulting  chain  code  Is  shown  in  Figure  3-25  (b) . 

In  Figure  3*25  (b) , the  point  labeled  as  "I"  is  the  starting  point.  The 
majority  code  In  a continuous  line  segment  consisting  of  eight  coded 
points  Is  selected  as  the  primitive  of  that  line  segment.  Primitives 
selected  by  this  method  approximate  the  primitives  defined  In  Figure 
3.20.  When  a line  is  terminated  or  branched  before  it  counts  to  eight 
points,  the  short  line,  If  longer  than  two  points,  will  be  considered 
as  a normal  line  segment;  thus,  a primitive.  Otherwise,  the  line  Is 
neglected.  The  binary  tree  representation  of  pattern  E Is  given  In 
Figure  3*25  (c) . The  leftmost  I of  each  strings  In  the  column  entitled 
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INPUT  CHARACTER 
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TREE  REPRESENTATION 


TREE  DOMAIN  PRIMITIVE  RANK 


11111 

111111 
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Figure  3.25  Input  Format  (a).  Chain  Code  Result  (b) , 
and  Binary  Tree  Representation  (c) . 
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as  "tree  domain"  corresponds  to  the  Identify  of  U In  Definition  3.1. 

The  rest  of  the  symbols  In  the  string  are  added  according  to  Definition 
3.2.  Therefore,  1,  11,  12,  111, — ,etc.  In  Figure  3.25  (c)  corresponds 
to  tree  domain  0,  1,  2,  1.1, — ,etc.  by  Definition  3*2. 

Five  characters,  A,  C,  D,  E,  H,  are  used  in  this  experiment.  The 
assumed  sample  patterns  for  each  of  the  five  characters  and  their  tree 
representations  are  shown  in  Figure  3.26.  Their  respective  tree  grammars 
are  given  in  Appendix  D.l. 

A total  of  26  patterns,  casually  or  meticulously  printed  characters, 
are  tested.  Assume  that  a ■ y * 6 ■=  1.  The  input  patterns,  their  tree 
representations,  as  well  as  classification  results  and  their  distance 
from  the  assigned  sample  pattern  are  given  in  Appendix  D.2.  The  entire 
recognition  scheme  is  programmed  in  Fortran  IV  on  a CDC  6500  computer. 

The  cpu  time  used  for  chain  coding  and  generating  tree  representation 
of  each  input  is  given  under  the  title  "time  used  for  linking  a tree." 

The  actual  recognition  time  is  designated  as  "Time  used  for  parsing."  The 
average  recognition  time  is  i*.l  sec.  per  character. 

These  26  test  patterns  are  also  processed  by  the  (non-error-correcting) 
tree  automata  constructed  respectively  for  grammars  Gft,  Gp,  G£,  G^ 
given  in  Appendix  D.i.  Evidently,  a pattern  is  acceptable  only  when  its 
distance  from  a sample  pattern  is  zero  (see  Appendix  D.2).  Therefore, 
only  six  out  of  the  26  test  patterns  are  recognizable.  The  processing 
time  Is  .012  sec.  per  character  (average  over  the  six  recognizable  patterns). 
Compared  with  their  corresponding  non-error-correcting  schemes,  error- 
correcting  tree  automata  have  the  potential  of  recognizing  highly  noisy 
and  distorted  patterns.  The  trade-off  is  the  processing  efficiency. 
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Using  the  techniques  suggested  In  Section  2.4.3  can  certainly  reduce  the 
processing  time  significantly. 

When  a large  set  of  training  samples,  including  noisy  and  distorted 
patterns,  is  available,  a pattern  grammar  inferred  from  these  samples 
will  reliably  represent  its  corresponding  pattern  structure.  A con- 
ventional (non-error-correcting)  syntax  analyzer  can  thus  perform 
recognition  tasks  satisfactorily.  On  the  other  hand,  when  the  number 
of  training  samples  is  small,  a pattern  grammar  inferred  is  usually  very 
poor  in  describing  the  actual  pattern  structure.  A conventional  syntax 
analyzer  designed  according  to  the  inferred  grammar  may  often  reject 
(noisy)  patterns  that  should  be  accepted.  In  such  a case,  an  error- 
correcting  syntax  analyzer,  such  as  the  proposed  ECTA,  could  certainly 
recognize  (or  accept)  these  (noisy)  patterns  by  using  the  minimum- 
distance  criterion. 


CHAPTER  4 


CLUSTERING  ANALYSIS  FOR  SYNTACTIC  PATTERNS 


A.  I Introduction 

In  statistical  pattern  recognition,  a pattern  is  represented  by  a 
vector,  called  a feature  vector.  The  similarity  between  two  patterns  can 
often  be  expressed  by  a distance,  or  more  generally  speaking,  a metric 
in  the  feature  space.  Cluster  analysis  can  be  performed  on  a set  of 
patterns  on  the  basis  of  a selected  similarity  measure  [3].  In  syntactic 
or  linguistic  pattern  recognition  C5l • a pattern  is  represented  by  a 
sentence  in  a language.  The  Sentence  could  be  a string,  a tree,  or  a 
graph  of  pattern  primitives  and  relations.  The  emphasis  of  such  a 
representation  Is  on  the  structure  of  patterns  which  is  described  by  the 
syntax  of  a language.  A similarity  measure  between  two  syntactic  patterns 
must  include  the  similarity  of  both  their  structures  and  primitives. 

In  Chapter  2 and  Chapter  3,  we  have  proposed  distance  measures  for  strings 
and  trees,  which  leads  to  the  study  of  clustering  analysis  for  syntactic 
patterns. 

The  conventional  clustering  methods,  such  as,  the  minimum  spanning 
tree,  the  nearest  (or  K-nearest)  neighbor  classification  rule  and  the 
method  of  clustering  centers  can  be  extended  to  syntactic  patterns.  We 
shall  briefly  describe  the  extension  In  Section  4.2.  An  illustrative 
example  using  a set  of  character  patterns  are  presented  in  Section  4.3. 

The  studies  described  In  Section  4.2  and  Section  4.3  are  mainly 
on  the  pattern-to-pattern  basis.  An  input  sentence  (a  pattern)  Is 
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compared  with  sentences  in  a formed  cluster,  one  by  one,  or  with  the 
representation  (cluster  center)  of  the  cluster.  In  Section  4.5  we  shall 
use  the  distance  measure  between  a sentence  and  a language  proposed  In 
Chapter  2.  The  proposed  clustering  procedure  is  combined  with  a 
grammatical  inference  procedure  and  an  error-correcting  parsing  technique. 
The  Idea  is  to  model  the  formed  cluster  by  Inferring  a grammar,  which 
Implicitly  characterize  the  structural  Identity  of  the  cluster.  The 
language  generated  by  the  grammar  may  be  larger  than  the  set  consisting 
of  the  members  of  the  cluster,  and  includes  some  possible  similar 
patterns  due  to  the  recursive  nature  of  grammar.  Then  the  distance  between 
an  input  sentence  (a  syntactic  pattern)  and  a language  (a  group  of 
syntactic  pattern)  Is  computed  by  using  a ECP  (error-correcting  parser). 

The  recognition  is  based  on  the  nearest  neighbor  rule. 


In  order  to  determine  min  d(x,*,y),  for  some  I,  the  distance  between 

J J 
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str ing-to-strlng  correction  algorithm  proposed  In  [26]  yields  exactly 
the  distance  between  two  strings  defined  in  Definition  2.4.  We  shall 
briefly  describe  the  algorithm  in  Appendix  G.l. 

The  nearest  neighbor  classification  rule  can  be  easily  extended 
to  the  K-nearest  neighbor  rule.  Let  Xj  - (xj * ,x2* ,. . .x*n  } be  a 

• I ~ t I 

reordered  set  of  Xj  such  that  d(xj  ,y)  £ d(x^  ,y)  Iff  j < t,  for  all 
I <_  j , t <_  n j , then 

C K K 

decide  ye  * If  \ ■JrdCx.Vy)  $ \ j d(x.2,y)  (4.2) 

C2  j-1  K J j-1  K j 

We  shall  describe  a clustering  procedure  in  which,  the  classification 
of  an  input  pattern  is  based  on  the  nearest  (or  K-nearest)  neighbor  rule. 


Algorithm  4.1 


Input:  A set  of  samples  X « {xj,x2» 

or  threshold,  t. 


,xn)  and  a design  parameter, 


output:  The  partition  of  X into  m clusters,  C j ,0^, . . . ,Cm> 
Method : 


Step  1 . Assign  Xj  to  Cj,  j-1,  m-I. 

Step  2.  Increase  j by  one.  If  D - ^cKx^1  ,Xj)  Is  the  minimum, 
1 < i < m,  and 


(I)  D t,  then  assign  Xj  to  Cj 
( t I ) D > t,  then  Initiate  a new  cluster  for  Xj,  and 
increase  m by  one. 

Step  3.  Repeat  Step  2 until  all  the  elements  of  X have  been  put 


In  a cluster. 


Note  that,  In  Algorithm  4.1,  a design  parameter  is  required.  A 
commonly  used  clustering  procedure  is  to  construct  a minimum  spanning 
tree.  Each  node  on  the  minimun  spanning  tree  represents  an  element  in 
the  sample  set  X.  Then  partition  the  tree.  Actually,  when  the  distances 
between  all  of  the  pairs,  d(Xj,Xj),  Xj , Xj  e X,  are  available,  the 
algorithm  for  constructing  minimum  spanning  tree  Is  the  same  as  that 
where  X is  a set  of  feature  vectors  in  the  statistical  pattern  recognition 
[94].  The  algorithm  is  given  in  Appendix  6.2. 


4.2.2.  The  Cluster  Center  Techniques 


I . 


Let  us  define  a 3-metric  for  a sentence  x^  in  cluster  C.  as  follow, 


s'  ■ ~ l d(x.!  ,x^‘) 
J nl  1-]  j L 


(4.3) 


Then,  xj  is  the  cluster  center  of  C{,  if  3jl  ■ m^n  {3^*  |l  < l < nf}, 
Xj'  is  also  called  the  representation  of  C^,  denoted  Aj . 

The  following  clustering  algorithm  is  given  In  [94], 


Algorithm  4.2 

input:  A sample  set  X = {Xj ,x2, . . .xn> . 

Output:  The  partition  of  X Into  m cluster 
Method: 

Step  1 . Let  m elementsof  X,  chosen  at  random,  be  the  "representation" 

of  the  m clusters.  Let  them  be  called  A.,  A-,. ...A  . 

1 2 m 

Step  2.  For  all  I,  x^  e X is  assigned  to  cluster  j,  iff 
d(Aj,X|)  is  minimum. 

Step  3.  For  all  j,  a new  mean  Aj  Is  computed.  Aj  is  the  new 
representation  of  cluster  j. 

Step  4.  If  no  Aj  has  changed,  stop.  Otherwise,  go  to  Step  2. 


4.  3.  An  Illustrative  Example 


4. 3.1.  Data  Preparation 

A set  of  English  characters  is  used  to  illustrate  the  proposed 
clustering  procedure.  There  are  51  characters  from  nine  different 
classes:  D,  F,  H,  K.  P,  U.  V,  X,  and  Y.  Eight  of  the  nine  classes  are 
selected  from  four  groups,  each  with  characters  of  similar  structure; 
that  is,  D and  P,  H and  K,  U and  V,  and  X and  Y.  The  class  of  character 
F is  different  from  the  other  eight  classes.  Each  character  is  a 
continuous  line  pattern  on  a 20  x 20  grid  as  shown  in  Figure  4.1(a). 
Starting  from  its  lower  left  corner,  each  input  pattern  is  chain-coded 
[77]  cell  by  cell  by  a subroutine.  Each  successive  cell  is  coded  as  A, 

B,  C,  or  D according  to  its  position  relative  to  that  of  the  current 
one.  After  three  consecutive  cells  have  been  coded,  a pattern  primitive 
of  this  line  segment  is  extracted. 

(a)  Primitive  and  Subpattern  Extraction 

Four  pattern  primitives  which  are  line  segments  with  four  different 
orientations, y/a,  | b,\^f  are  selected.  For  example,  ABA,  CCA,  or  ADD 

are  reduced  to  primitive  a,  b,  or  d,  respectively.  A primitive 
extraction  subroutine  PRIMITIVE  is  constructed  and  several  typical  results 
from  the  subroutine  are  shown  in  Figure  4.2.  In  the  meantime,  the  chain- 
code  subroutine  searches  for  singular  points  of  the  input  pattern  for 
segmentation  purposes.  Coordinates  of  the  singular  points  at  two  ends 
of  a line  segment  (called  a branch)  are  then  recorded.  For  example, 
the  chain-coded  result  for  the  character  P shown  in  Figure  4.1(a)  is 
given  in  Figure  4.1(b),  in  which  the  symbol  "S"  represents  singular 
points.  From  Figure  4.1(b),  the  pattern  has  three  branches,  which. 
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Figure  4.1  The  primitive  extraction  of  a character  P 
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together  with  the  coordinates  of  their  end  points,  are  shown  in  Figure  4.1(c). 
Each  branch  is  a subpattern,  and  the  three  branches  consist  of  primitive 
strings  bbb  (Branch  l),  b (Branch  2),  and  dddbdd  (Branch  3) > respectively. 

(b)  String  Representation 

From  the  recorded  coordinates  of  starting  and  terminating  points 
(tail  and  head)  of  each  branch,  the  subrout ineCONCATE  determines  con- 
catenation relations  between  branches.  Following  Shaw's  PDL  [95], 
three  concatenation  relations,  +,  x,  *,  and  the  parentheses  ( and  ) are 
used.  However,  * is  used  here  primarily  for  the  situation  of  a "self  loop"; 

i 

that  is,  a branch  of  which  the  head  and  the  tail  coincide,  in  such  a 
case,  the  notation  (B)*  is  used  where  B is  the  branch.  The  priority  of 
the  three  concatenation  relations  follows  the  order  of  *,  x,  and  then 
+.  Consequently,  redundant  parentheses  are  eliminated.  For  the  character 
P in  Figure  4.1(a),  the  concatenations  of  branches  are  expressed  as 
BRANCH  1 + (BRANCH  2 + BRANCH  3)*.  The  final  string  representation  in 
terms  of  pattern  primitives  Is  bbb  + (b  + dddbdd)*.  Similarly,  the 
character  K in  Figure  4.3  is  represented  by  BRANCH  1 + BRANCH  2 x BRANCH  3 x 
BRANCH  4,  and  finally  by  a string  of  primitives  b + bbbxaaxbc. 

The  51  chain-coded  patterns  with  their  pattern  numbers  which  represent 
input  sequence  are  shown  in  Figure  4.4.  The  string  representations 
extracted  from  subroutines,  CHAIN-CODE,  PRIMITIVE  and  CONCATE,  are  also 
1 isted  in  Table  4. 1 . 


4.3.2.  Experiments 

(a)  The  Distance 

The  proposed  distance  measure  is  performed  on  the  linguistic 
representations  of  pattern  (sentences),  rather  than  on  the  pattern 
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Pattern 

No. 

String  Representation  \ 

Pattern 

No. 

String  Representation 

1 

bbb+(b+dddbdd)* 

27 

bbb+  ( dx  ( b+dd) ) xdd 

2 

aaa+cddxaaxcbb 

28 

bb+cbxaaa 

3 

(bbb+ddcbaa)* 

29 

bb+bbxddd+axbb 

4 

cbbbxdabbbb 

30 

baa+bbxaaxcc 

5 

bb+(b+dd)xd 

31 

ha+abxdd-t-axbbb 

6 

bbbb+ccxbb 

32 

aa+ccxaxc 

7 

( bbb+dxddcbfca ) * 

33 

bb+Cbb-e-dxddcaad)  * 

8 

ba+bbxaaxcc 

34 

bb+bbxa+axcb 

9 

aaa+cxaaxb 

35 

bb+(bb+d)xdd 

10 

b+bbxdddd+bbxb 

36 

cbbxdaabb 

11 

(bbb+dxddbbad)  * 

37 

ba+bcxaa 

12 

bb4(bb+dd)xd 

38 

aa+ccxaaxcc 

13 

cbbbxabbba 

39 

Lb+(bb+dd) xdd 

14 

bb+bbxaaxcc 

40 

(bbbbb+dddcbaad) * 

15 

bbb+cxaa 

41 

dab+cbxba 

16 

cbbxda+bbxb 

42 

bb+(b+ddd)xd 

17 

bb+bbbxaaxbcc 

43 

bbb+cxa 

18 

bb+(bbb+dddba)* 

44 

b+dxabxdd+bxbb 

19 

b+bbbxd+bxdxb 

45 

(bb+ddcbdd)* 

20 

aa+cbxaaxcc 

46 

aa+cxaaxcc 

21 

bbb-t-(bbadcbad)  * 

47 

bbb+ddaabcddd 

22 

bbbxddabb 

48 

bbbxda+bbxb 

23 

bbb+bcxaa 

49 

baa+ccxbbxccc 

24 

b+(bb+ddcbad)* 

50 

la+bxa+abxbbb 

25 

b+bbbxaaxbc 

51 

cbbxdabbbb 

26 

bbbbbxabaa 

Table  4.1  The  51  character  patterns  and 
their  representations 
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themselves.  Consequently,  whether  the  model  yields  a good  measurement 

is  a matter  of  choosing  representations.  Figure  4.5  illustrates  a couple 

of  examples.  In  Figure  4.5(a)  the  distance  between  the  two  K's  is 

smaller  than  a K and  an  X.  Similarly,  in  Figure  4.5(b)  the  distance 

between  a U and  a distorted  U,  'M',is  still  smaller  than  the  distorted 
* 

U and  a H. 


Using  algorithm  in  Appendix  G.2  the  minimum  spanning  tree  for  the  5i 
characters  is  constructed  and  shown  In  Figure  4.6.  The  true  clusters 
are  circled  on  the  tree. 


The  Clustered  Results  Us  Inc 


>ri thm  4.1 


The  results  of  clustering  based  on  K-nearest  neighbor  classification 


rule  are  given  in  Figure  4.7,  Figure  4.8  and  Figure  4.9  when  K«I  and 
t-6,  K-3  and  t-6,  K-3  and  t-6.5  respectively,  where  t is  the  preset 
threshold.  The  case  that  K-l  is  the  same  as  that  using  the  nearest 
neighbor  rule. 


The  clustering  procedure  given  in  Algorithm  4.2  does  not  require 
a preset  parameter.  Let  the  algorithm  be  initiated  by  choosing  the 
first  nine  input  patterns;  patterns  1 to  9 as  the  representations  of 
the  9 classes.  The  procedure  becomes  stable  after  three  iterations.  The 
results  of  all  the  iterations  are  given  in  Table  4.2.  The  final  clusters 
are  shown  In  Figure  4.10.  The  representation  of  each  formed  cluster  Is 
marked  with  a square. 

*ln  this  section,  all  the  experiments  use  unweighted  distance.  However, 
we  exclude  the  possibility  that  a primitive  namely,  a,  b,  c,  d,  can  not 
substitute  a relation  symbol,  namely,  +,  x,  *,  (,  ),  and  vice  versa. 
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Figure  4.5  The  distances  between  similar 
and  dissimilar  patterns. 
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(b)  the  second  Iteration 


Table  4.2  The  three  Iteration  results 
using  Algorithm  4.2 
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(e)  Remark 

From  the  experiments  described  In  (d)  and  their  results,  the  main 
weaknesses  that  exist  in  conventional  clustering  algorithms  are  also 
unavoidable  when  data  are  in  the  form  of  sentences.  These  weaknesses 
are;  (1)  the  requirement  of  thresholds  of  somewhat  arbitrary  nature, 

(2)  the  requirement  of  large  memory,  for  example,  tables  of  n(n-l)/2 
if  the  sample  set,  X,  has  n objects,  and  (3)  the  chain  effect.  However, 
the  chain  effect  could  be  improved  by  using  a weighted  distance. 


4.4.  A Proposed  Nearest  Neighbor  Syntactic  Recognition  Rule 

With  the  distance  between  a sentence  and  a language  defined  in 
Chapter  2,  we  can  construct  a syntactic  recognizer  using  the  nearest 
(or  K-nearest)  neighbor  rule.  Suppose  that  we  are  given  two  classes 
of  patterns  characterized  by  grammar  Gj  and  G2,  respectively.  For  an 
unknown  syntactic  pattern  y,  decide  that  y is  In  the  same  class  as 
L (G , ) If 


d(L(G|) ,y)  > d(L(G2),y) 


and  decide  that  y is  in  the  same  class  as  L (G if, 

d(L(G2),y)  > d(L(G,),y) 


(4.4) 


The  distance  d(L(Gj),y)  can  be  determined  by  a minimum-distance  ECP 
constructed  for  Gj.  Consequently,  a grammatical  Inference  procedure 
is  required  to  infer  a grammar  for  each  class  of  pattern  samples.  Since 
the  parser  also  gives  the  structural  description  of  y,  the  syntactic 
recognizer  gives  both  the  classification  and  description  of  y as  its 
output.  We  shall  suimarize  the  procedure  in  the  following  algorithm. 


i 


Input:  m sets  of  syntactic  pattern  samples 

X 


i " ,x2 Xn,  } Xm  " '~2 

and  a pattern  y with  unknown  classification. 

Output:  The  classification  and  structural  description  of  y. 


fx  m x m X 

l*i  »*9  xn  ' 

111 


Method: 
Step  1 . 
Step  2. 

Step  3« 


Infer  m grammars  Gj ,G2, . . . ,Gm,  from  Xj,X2 Xm,  respectively. 

Construct  minimum-distance  ECP’s,  Ej,E2 Em  for 

G, ,G9, . . . ,Gm,  respectively. 


I *2 

Calculate  d{L(Gk),y)  for  aii  I ■ 1 m.  Determine  t 

such  that 


d(L(G(£)fy)  -m‘n  d(L(Gk),y) 

y is  then  classified  as  class  t.  In  the  meantime,  the 
structural  description  of  x can  be  obtained  from  E^. 


4.5.  A Clustering  Procedure  for  Syntactic  Patterns 
4.5.1.  The  Al gori thm 

Using  the  distance  defined  in  Chapter  2 as  a similarity  measure 
between  a syntactic  pattern  and  a set  of  syntactic  patterns,  we  can 
perform  a cluster  analysis  to  syntactic  patterns.  The  procedure  again 
involves  error-correcting  parsing  and  grammatical  inference.  In  contrast 
to  the  nearest  neighbor  rule  In  Section  4.4  which  uses  a supervised 
inference  procedure,  the  procedure  described  in  this  section  is  basically 
non-supervised.  When  the  syntactic  pattern  samples  are  observed  sequentially, 
a grammar  can  be  easily  infered  for  the  sample  observed  at  each  stage 
of  the  clustering  procedure.  We  propose  the  following  clustering  procedure 
for  syntactic  patterns: 


U7 


Algor! thm  k.k 

Input:  A set  of  syntactic  pattern  samples  X ■ {x^ . . . ,xn) 

where  Xj  Is  a string  of  terminals  or  primitives.  A threshold  t. 

Output:  The  assignment  of  Xj , I ■ 1 n to  m clusters  and  the 

(k) 

grammar  G , k ■ l,...,m,  characterizing  each  cluster. 

Method: 

Step  1.  Input  the  first  sample  x^ , infer  a grammar  Gj ^ from 
x,.  L(Gj(l))  D {xj }. 

Step  2.  Construct  an  error-correcting  parser  for  G^  ^ 

Step  3«  Input  the  second  sample  X2*  use  to  determine  whether 

or  not  X2  is  similar  to  x^  by  comparing  the  distance 
between  L(Gj^)  and  i.e.,  d(x2»L(Gj  ^))  , with  a 
threshold  t. 

(l)  If  d(x2,L(Gj < t,  Xj  and  X2  are  put  Into  the 
same  cluster  (Cluster  1).  Infer  a grammar  G2^ 
from  {xj,x2>. 

(11)  If  d(x2,L(Gj  '^))  *_  t,  Initiate  a new  cluster  for 

(2) 

X2  (Cluster  2)  and  infer  a new  grammar  Gj ' ' from 

x2>  in  this  case,  there  are  two  clusters  characterized 

by  and  Gj^,  respectively. 

Step  4.  Repeat  Step  2,  construct  error-correcting  parsers  for 

G2^  or  G^2^  depending  upon  d(x2,L(Gj  ^))  < t or 

d(x2,L(G)  ^))  ^ t,  respectively. 

Step  5.  Repeat  Step  3 for  a new  sample.  Until  all  the  pattern 

samples  are  observed,  we  have  m clusters  characterized 

by  G„  G„  ^,...,G„  ^ , respectively. 

nl  n2  nm 


The  parsers  (non-error-correcting)  constructed  according  to 

G ^,G  ^ G ^ could  then  forma  syntactic  recognizer  directly 

n?  n2  "m 

for  the  m-class  recognition  problem. 

The  threshold  t is  a design  parameter.  It  can  be  determined  from 

a set  of  pattern  samples  with  known  classifications.  For  example,  if 

we  know  that  the  sample  Xj  is  from  Class  1 characterized  by  G^  and  the 

(2) 

sample  Xj  is  from  Class  2 characterized  by  G , then  t < d(xj,xJ. 

Or,  more  generally  speaking, 


t < Min  {d(L(G(2)),x,)  , d(L(G(l)),Xj)  } 


(4.5) 


For  m classes  characterized  by  G^  ,G^  ,.  ..,G^  , respectively,  we  can 


choose 

t < Min  (d(L(GW),x(k))  },  k + L I 

where  x^  is  a pattern  sample  known  from  Class  k and  L(G^)  Is  the 


(4.6) 


grammar  characterizing  Class  L (t  + k) . If  the  above  required  information 
is  not  available,  an  appropriate  value  of  t will  have  to  be  determined 
on  an  experimental  basis  until  a certain  stopping  criterion  is  satisfied 
(for  example,  with  a known  number  of  clusters). 

4.5.2  An  Experiment 

Let  the  same  set  of  pattern  samples  used  in  Section  4.3  be  tested 
by  Algorithm  4.4.  The  subroutines  of  finding  string  representations 
for  the  5i  character  patterns  are  stiii  used  here,  in  addition,  we  wiii 
have  subroutines  for  grammatical  inference  and  error-correcting  parsing. 


(1)  Grammatical  Inference  (Step  I,  3,  and  5 in  Algorithm  4.4) 

By  comparing  its  distances  to  existing  clusters  with  a threshold  t, 
an  Input  pattern  sample  is  assigned  to  an  existing  cluster  or  a new 
cluster.  In  either  case,  a grammar  is  first  Inferred  for  the  single 
input  sample.  The  inferred  grammar  Is  then  merged  into  the  grammar 
(by  merging  their  productions)  characterizing  the  assigned  cluster. 

The  subroutine  REDUCE  combines  productions  of  the  two  grammars,  removes 
identical  productions,  and  unifies  nonterminals.  To  use  a simple 
inference  procedure.  Input  samples  are  assumed  to  be  generated  by  finite- 
state  grammars.  However,  non-seif-embedding  context-free  productions 
are  used  to  describe  concatenation  of  branches.  Each  branch  is  described 
by  finite-state  productions.  For  the  character  P shown  In  Figure  4.1, 
the  Inferred  grammar  Is  given  In  Table  4.3. 

(2)  Error-Correcting  Parser  (ECP  - Step  2 and  4) 

The  error-correcting  parser  used  In  this  example  Is  the  MDECP  given 
in  Algorithm  2.1  and  2.2.  Certain  realistic  assumptions  are  made  to 
reduce  the  number  of  error  productions.  For  example,  we  do  not  allow 
a substitution  error  that  occurred  between  a,  b,  c,  or  d and  4-,  x,  *, 
(,or).  Also,  the  concatenation  symbol  + or  x cannot  be  inserted  at 
the  end  of  a string. 

A simplified  flow  chart  for  the  complete  experiment  is  given  in 
Figure  4.11. 

(3)  Experimental  Results 

Following  the  clustering  procedure  described  in  Section  4.5»  three 
experiments  were  performed:  (i)  using  unweighted  (Levenshtein)  distance 
with  threshold  t - 3,  (II)  using  unweighted  (Levenshtein)  distance  with 


(0) 

6i  " ^vn*vt,p,so^ 

VH  - {Sq,BA,KA,BC ,EA,EB} 

vx  “ {(»  +»  X,  *,  ),  b,  d} 

P:  SQ  -*•  BA  + KA  * 

KA  ( BA  + BC  ) 

BA  b BA 
BA  •*  b 
BC  d BC 
BC  -►  d EA 
EA  -►  b EB 
EB  d EB 
EB  -►  d 

(0)  n.  n,  n-  nr 

L(6j  )»{b  +(b  +d*bd  }*|n|  jn^n^n^  are  positive  Integers}. 


Table  4.3  An  Inferred  grammar  for  pattern  1. 


call 

CHAIN-CODE 


call 

PRIMITIVE 


n:  number  of  pattern  samples 

i:  input  sequence 

m:  current  number  of  clusters 

j:  the  closest  cluster  for 
input  i 

t:  assigned  cluster  for 
input  i 

I 

G , . 


CONCATE  (x.) 


threshold 


<1* 

f 

- I\  no  ^ 

/yes 

call 

ECP  (j,  D) 


REDUCE  (G.) 


Where  G(  - {V( ,Ej ,Pj ,SS>  Is  the  reduced  grammar  equivalent  to 
C.^U  G.(2>  U •••  UG<(m) 


Figure  $.11  Flow  chart  of  the  clustering  Algorithm  $.$ 
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threshold  t * 4,  and  (ill)  using  weighted  distance  with  threshold  t ■ 3. 
The  results  of  clustering  analysis  are  listed  In  Table  4.4.  In  Table  4.4 


pattern  samples  are  grouped  according  to  the  clustering  result  of  the 
experiment  (ill);  pattern  numbers  In  the  first  column  correspond  to  those 
listed  in  Figure  4.4.  The  number  in  the  parentheses  of  the  last  colunn 
in  Table  4.4  represents  the  weighted  distance  between  the  pattern  and 
the  most  similar  cluster;  that  is,  the  cluster  to  which  the  pattern 
is  assigned. 

For  Experiment  (1),  there  were  11  clusters  formed.  Pattern  18  was 

not  assigned  to  the  same  cluster  with  other  character  P's,  and  Pattern  27 

» was  not  assigned  to  the  same  cluster  with  other  F's.  For  Experiment  (II), 

1 

t was  Increased  from  3 to  4,  only  7 clusters  were  formed.  Both  Pattern  18 
and  Pattern  27  were  correctly  clustered.  However,  all  the  X's  and  the 
K's  were  assigned  to  the  same  cluster.  In  both  experiments.  Patterns  13, 

26  and  47  were  not  correctly  clustered.  For  Experiment  (ill),  10  clusters 
were  formed;  all  the  patterns  except  Pattern  47  were  correctly  clustered. 
The  weights  associated  with  each  error  productions  used  in  the  experiment 
are  given  in  Table  4.5. 

The  final  inferred  grammar  G^,  n * 51 , from  the  third  experiment, 

is  listed  in  Table  4.6.  The  grammar  is  the  union  of  the  inferred  grammars 

(0)  (l)  (2)  (9) 

for  each  cluster;  namely,  G , G , G , — ,G  . Nonterminals 

Sq,  Sj,  S2,  — ,Sg  In  the  grammar  are  the  start  symbols  of  G^°\  G^\ 

(2)  (9) 

G , — ,G  , respectively.  In  order  to  carry  out  the  parsing  for 
all  clusters  using  a single  parser,  we  merge  all  the  productions  originated 
from  Sq,  Sj,  — ,S^  into  a single  grammar  G^j  by  adding  productions  SS  -+•  Sq, 
SS  -*■  Sj,  — ,SS  -*•  Sg  where  SS  is  the  start  symbol  of  Gj-j. 
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Pattern 

No. 

String 

Representation 

True 

Character 

Unweighted 
t-3  t-4 

Weighted 

t-3 

1 

bbb+(b+dddbdd)* 

P 

0 

0 

o(-) 

18 

bbb+(bbb+dddbaa)  * 

P 

0 

0 

0(2.2) 

21 

bbb+(bbadcbad) * 

P 

8 

0 

0(3.0) 

24 

b+(bb+ddcbad)* 

P 

0 

0 

0(1.5) 

33 

bb+ ( bb+dxddcaad) * 

P 

0 

0 

0(1.5) 

2 

aaa+cddxaaxcbb 

X 

1 

1 

1 (”) 

9 

aa+ccxaxc 

X 

1 

1 

1(1.5) 

20 

aa+cbxaaxcc 

X 

1 

1 

1(0.6) 

32 

aaa+cxaaxb 

X 

1 

1 

1(1.0) 

38 

aa+ccxaaxcc 

X 

1 

1 

1(0.) 

46 

aa+cxaaxcc 

X 

1 

1 

1(0.) 

49 

baa+ccxbbxcc 

X 

1 

1 

1(2.7) 

3 

(bbb+ddcbaa)* 

D 

2 

2 

2 (-) 

7 

(bbb+dxddcbbaa) * 

D 

2 

2 

2(1.0) 

11 

(bbb+dxddbbad) * 

D 

2 

2 

2(1.5) 

40 

(bbbbb+dddcbaad) * 

D 

2 

2 

2(1.0) 

45 

(bb+ddcbdd)* 

D 

2 

2 

2(0.9) 

4 

cbbbxdabbb 

U 

3 

3 

3(“) 

16 

cbbxda+bbxb 

U 

3 

3 

3(2.0) 

22 

bbbxddabb 

U 

3 

3 

3(0.5) 

36 

cbbxdaabb 

U 

3 

3 

3(0.) 

48 

bbbxda+bbxb 

U 

3 

3 

3(0.5) 

51 

cbbxdabbbb 

U 

3 

3 

3(0.) 

5 

bb+(b+dd)xd 

F 

4 

4 

4 (-) 

12 

bb+(bb+dd)xd 

F 

4 

4 

4(0.) 

27 

bbb+ ( dx (b+dd)  ) xdd 

F 

9 

4 

4(2.0) 

35 

bb+(bb+d)xdd 

F 

4 

4 

4(0.) 

39 

bb+(bb+dd)xdd 

F 

4 

4 

4(0.) 

42 

bb+(b+ddd)xd 

F 

4 

4 

4(0.) 

6 

bbbb+ccxbb 

Y 

5 

5 

5(~) 

15 

bbb+cxaa 

Y 

5 

5 

5(2.0) 

23 

bbb+bcxaa 

Y 

5 

5 

5(0.9) 

28 

bb+cbxaaa 

Y 

5 

5 

5(1.0) 

37 

ba+bcxaa 

Y 

5 

5 

5(1.0) 

41 

dab+cbxba 

Y 

5 

5 

5(2.9) 

43 

bbb+cxa 

Y 

5 

5 

5(0.) 

8 

ba+bbxaaxcc 

K 

6 

1 

6(-) 

14 

bb+bbxaaxcc 

K 

6 

1 

6(0.9) 

17 

bb+bbbxaaxbcc 

K 

6 

1 

6(0.9) 

25 

b+bbbxaaxbc 

K 

6 

1 

6(0.) 

30 

baa+bbxaaxcc 

K 

6 

1 

6(0.) 

34 

bb+bbxa+axcb 

K 

6 

1 

6(1.6) 

10 

b+bbxdddd+bbxb 

H 

7 

6 

7(~) 

Table  4.4  The  result  clusters  of  the  51  characters 


Table  4.4  Continued 


Pattern 

String 

True 

Unweighted 

Weighted 

No. 

Representation 

Character 

t-3 

t-4 

t-3 

19 

b+bbbxd+bxdxb 

H 

7 

6 

7(1.5) 

29 

bb+bbxddd+axbb 

H 

7 

6 

7(1.0) 

31 

ba+abxdd+axbbb 

H 

7 

6 

7(2.0) 

44 

b+dxabxdd+bxbb 

H 

7 

6 

7(3-0) 

50 

ba+bxa+abxbbb 

H 

7 

6 

7(3.0) 

13 

cbbbxabbba 

V 

3 

3 

8(-) 

26 

bbbbbxabaa 

V 

5 

5 

8(0.5) 

47 

bbbxddaabcddd 

D 

10 

7 

9(-) 

' --•-?*? i.  >%,  % r'v^r^L .. _ 

-«>■■■—- 


Li*—’.. 


BBWWI 


i 


Rule 

Weight 

E -*■  a 

0. 

a 

E -v  b 

.9 

a 

E c 

2.0 

a 

E d 

1 .0 

a 

E -*■  aE 

0. 

a a 

E dE 

0.5 

a a 

E +E 

0.6 

a a 

E -*■  xE 

0.5 

a a 

E \ 

0.9 

a 

Eb 

0. 

Eb  + a 

1 .0 

Eb'*  c 

0.6 

Eb-d 

1.5 

Eb  ■*  aEb 

1.0 

Eb  * bEb 

0. 

Eb  * d£b 

1.0 

Eb  +Eb 

1 .0 

Eb  - xEb 

1.0 

Eb^ 

0.5 

E c 

0. 

c 

E -*•  b 

1.0 

c 

E + bE 

0.9 

c c 

E ■+  cE 

0. 

c c 

E ■+  X 

0.5 

c 

Ed"d 

0. 

Ed  * 3 

1 .2 

E . b 

2.0 

d 

Table  k. 5 Hew  rules  added 
to  the  original 
grammars 
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Table  4.5  Continued 


Rule 

Weight 

Ed*c 

0.5 

Ed  - dEd 

0. 

Ed  - xEd 

1.0 

Ed"  A 

2.1 

E ♦ x 

X 

0. 

E + dE 

X X 

1.0 

E -*•  xE 

X X 

0.5 

E+*  + 

0. 

E+  - aE+ 

1 .0 

E+-x 

0.5 

E*-* 

0. 

E)  -> 

0. 

E)  - )E) 

0.5 

E(-( 

E(  * dE( 

E(  * XE( 

E(  * (E( 

0. 

0.5 

0.5 

0.5 

. ..  --  — V 


157 


G ■ (VN,VT,P,SS),  where 

V^"  {XY|X  and  Y are  alphabets}  U {Sj | l ■ 0,1.. ..,9) 

vT*  t*»  b*  c»  d»  +».*»  *»  . 


1 

ss  - s0 

26 

Sj  ♦ KD 

2 

s0  -*•  BA  ♦ KA  * 

27 

KD  BG  x BK 

3 

KA  ( BA  ♦ BC) 

28 

BK  -*•  d El 

4 

BA  •*  b BA 

29 

El  ♦ a BA 

5 

BA  ♦ b 

30 

ss  -►  sv 

6 

BC  ♦ d BC 

31 

S^  ♦ BA  ♦ KF 

7 

BC  ♦ d EA 

32 

KE  ( BA  + EB  ) 

8 

EA  -*•  b EB 

33 

KF  ♦ KE  x EB 

9 

EB  -*•  d EB 

34 

SS  ♦ S$ 

10 

EB  ♦ d 

35 

Sj  -*■  BA  + KG 

11 

SS  ♦ S, 

36 

KG  BQ  x BA 

12 

S,  ♦ BO  ♦ KB  x BG 

37 

BQ  -k  c BQ 

13 

KB  - BE  x BD 

38 

BQ  -*•  c 

1* 

BD  -*■  a BD 

39 

S2  -►  KH  * 

15 

BD  ■*  a 

40 

KH  ■♦•  ( BA  + EB  x 1 

16 

BE  -*•  c EB 

41 

EF  -►  b EF 

17 

BG  ♦ c BA 

42 

ss  Sg 

18 

ss  ♦ s2 

43 

Sg  ♦ EF  + Kl  x BQ 

19 

S2  «►  KC  * 

44 

Kl  -*■  BA  x BD 

20 

KC  ♦ ( BA  ♦ B1  ) 

45 

Sj  -►  BD  + KJ  x BA 

21 

Bl  ♦ d Bl 

46 

KJ  -*■  BQ  x BD 

22 

Bl  ♦ d EE 

47 

SS  -►  S^ 

23 

EE  ♦ c EF 

48 

Sj  -►  BA  ♦ KK  + KL 

24 

EF  ♦ b BD 

49 

KK  -►  BA  x EB 

25 

ss  ♦ s3 

50 

KL  BA  x BA 

Table  4. 6 The  grammar  G 


Table  A. 6 Continued 


» 


51 

S2  KM  * 

82 

S3* 

KS 

52 

KM-*-(BA  + EBxBR) 

83 

KS  -*• 

BA  x 

BK 

53 

BR  -*•  d BR 

84 

BK  -*• 

d BK 

5* 

BR  -*•  d EG 

85 

S5 

BA  + 

KT 

55 

EG  + b EG 

86 

KT  -*• 

BU  x 

BD 

56 

EG  -*•  b EH 

87 

so  - 

BA  + 

KU  * 

57 

EH  -*■  a EB 

88 

Ml-*- 

( BA 

+ EJ 

) 

58 

ss  -s8 

89 

EJ  ♦ 

d EJ 

59 

Sg  KN 

90 

S8 

KW 

60 

KN  BG  x BS 

91 

KW  -v 

BA  x 

BS 

61 

BS  + a EF 

92 

S4^ 

BA  + 

KZ 

62 

Sg  BA  + Kl  x BQ 

93 

KY  -► 

( EB 

x KE 

) 

63 

S$  -*■  BA  + KJ 

94 

KZ  ♦ 

KY  x 

EB 

64 

Sj  -*■  KO  + KL 

95 

S5 

BA  + 

HA 

65 

KO  -*•  BG  x BT 

96 

MA  -*• 

BG  x 

BD 

66 

BT  - d BD 

97 

S7 

BA  + 

KK  + 

MC 

67 

Sg  -*■  BA  + Kl  x BU 

98 

MC  f 

BD  x 

BA 

68 

BU  -v  b BQ 

99 

S7 

EF  + 

HD  + 

MC 

69 

SQ  BA  + KP  * 

100 

MD  -*■ 

El  x 

EB 

70 

KP  -*-  ( BA  + BV  ) 

101 

S1 

BD  + 

MA  x 

BQ 

102 

s« 

BA  + 

ME  * 

71 

BV  •*>  d BV 

0 

72 

BV  -*•  d EF 

103 

ME  •*■ 

( BA 

+ EB 

x CB  ) 

73 

S7  BA  + KK  + KK  x BA 

104 

CB 

d CB 

74 

Sj  BD  + KJ  x BQ 

105 

CB  -*• 

d EL 

75 

SQ  -*•  BA  + KR  * 

106 

EL  -*• 

c EH 

76 

KR  -*•  ( CA  ) 

107 

EH  ♦ 

a EH 

77 

CA  b CA 

108 

s6 

BA  + 

Kl  + 

MF 

78 

CA  -k  b El 

109 

MF  -*■ 

BD  x 

BG 

79 

El  a EJ 

110 

El  -*• 

a El 

80 

EJ  -*•  d EK 

111 

Se 

EF  + 

KT 

81 

EK  c EG 

112 

s,  - 

BD  + 

KJ  x 

BQ 

Tabie  4.6  Continued 


113  S2  - KU  * 

114  S5  *►  BK  + MG 

115  MG  BG  x EF 

116  Sk  * BA  + MH 

117  MH  *►  K£  x EB 

118  S$  *►  BA  + Ml 

119  Ml  -*•  BQ  x BD 

120  Sj  -*•  BA  + MJ  x EB  + MK 

121  MJ  EB  x El 

122  MK  -*■  BA  x BA 

123  S2  ML  * 

124  ML  -*■  ( BA  + CC  ) 

125  CC  -*■  d CC 

126  CC  -*■  d EH 

127  EN  -*•  c EA 

128  SS  -*•  Sg 

129  Sg  MM 

130  MM  -►  BA  x CD 

131  CD  d CD 

132  CD  -*■  d EQ 

133  EQ  * a EQ 

134  EQ  *►  a ER 

135  ER  -*■  b BE 

136  S3  MN  + KL 

137  MN  -*■  BA  x BT 

138  Sj  -*•  EF  + KG  x BQ 

139  S7  -*•  EF  + Kl  + MP 

140  MP  -►  El  x BA 
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4.6.  Conclusions  and  Remarks 

In  Sections  I*. 4 and  4.5,  we  have  demonstrated  that,  by  using  an 
error-correcting  parser,  the  distance  between  a syntactic  pattern  and 
a group  of  syntactic  patterns  can  be  determined.  Such  a distance  can 
be  used  for  the  nearest  neighbor  recognition  and  the  cluster  analysis 
for  syntactic  patterns.  Essential  parts  in  both  applications  include 
grammatical  Inference  and  error-correcting  parsing.  If  the  correct 
classifications  of  pattern  samples  are  known,  the  proposed  nearest 
neighbor  syntactic  recognition  rule  can  be  applied  to  determine  the 
classification  and  structural  description  of  an  unknown  pattern.  Using 
Algorithm  2.3  to  determine  the  average  distance  between  a sentence  and 
the  K-nearest  sentences  in  the  language,  the  proposed  rule  for  a single 
nearest  neighbor  can  be  easily  extended  to  the  K-nearest  neighbors. 

When  the  correct  classifications  of  pattern  samples  are  unknown, 
a non-supervised  procedure  must  be  used.  In  this  case,  the  proposed 
clustering  procedure  can  be  applied.  Using  error-correcting  parsers  in 
cluster  analysis,  after  the  clustering  result  Is  obtained,  we  couid  only 
implement  conventional  non-error-correcting  parsers  for  recognition. 
Furthermore,  the  grammars  inferred  couid  be  In  finite-state  form,  the 
construction  of  conventional  parsers  for  finite-state  grammars  Is 
straightforward,  and  the  parsing  procedure  is.  In  general,  deterministic 
and  efficient.  The  proposed  clustering  procedure  can  certainly  be 
extended  to  syntactic  patterns  represented  by  trees  since  tree  grammar 
Inference  procedures  [93]  and  error-correcting  tree  automata  have  already 
been  developed.  When  a general  error-correcting  parser  Is  used,  the 
computer  time  required  for  clustering  analysis  could  be  slow.  (For 
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[ 

I 


! 


example,  using  a CDC  6500  computer  with  FORTRAN  IV  programming  language, 
the  average  computer  time  for  analyzing  each  pattern  in  Experiment  (iii) 

Is  37  sec.)  Nevertheless,  this  requirement  may  not  be  critical  since 
a clustering  algorithm  is  used  primarily  for  pattern  analysis  rather 

l 

than  recognition.  Besides,  Inference  procedures  for  special  grammars  [7*t]» 
(such  as  finite-state  and  precedence  grammars)  can  always  be  employed  to 
speed  up  the  analysis. 

The  grammar  Inferred  for  each  cluster  often  generates  not  only  the 
sentences  (syntactic  patterns)  already  in  the  cluster,  but  also  sentences 
with  similar  structures.  For  example,  the  grammar  In  Table  **.3  is 

an  Inferred  grammar  for  Pattern  1,  bbb+(b+dddbdd)*.  However,  L(Gj^)  *= 

{b  + (b  + d 3 b d J*|nj,n2,nj  and  n^  are  positive  integers)  which 
represents  character  P of  different  sizes  (Cluster  0).  Consequently, 
the  unweighted  distance  between  Pattern  18  (a  character  P)  and  Cluster  0 
Is  2 although  the  unweighted  distance  between  Pattern  18  and  Pattern  1 
is  i*.  For  this  reason,  the  clustering  procedure  proposed  in  Section  **.5 
appears  to  be  more  effective  and  flexible  than  that  of  computing  the 
distance  between  an  input  pattern  sample  and  a set  of  prototype  or 
reference  patterns  on  a sentence-by-sentence  basis. 

With  syntactic  or  linguistic  representations  of  data  being  more 
and  more  common  in  pattern  recognition,  speech  and  language  analysis, 
and  database  systems  [98-101],  nearest  neighbor  recognition  rule,  and 
clustering  procedures  for  syntactic  patterns  should  find  their  applications 
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CHAPTER  5 

A SYNTACTIC  APPROACH  TO 
TEXTURE  ANALYSIS 


5.1  Introduction 

Research  on  texture  ana1ys1s--mode1 Ing,  synthesis,  classification, 
and  discrimination — has  received  Increasing  attention  In  recent  years  [78]. 
Texture  Is  a term  for  the  quality  of  a surface.  The  feature  that  dominates 
a texture  scene  Is  the  repetitive  or  quasl-repetl tlve  pattern.  Texture 
Information  Is  valuable  In  scene  segmentation,  especially  in  those  cases 
in  which  the  contrast  between  the  object  to  be  observed  and  the  background 
Is  poor.  A survey  of  research  In  this  area  can  be  found  in  Zucker  [79]. 
Applications  of  texture  analysis  include  terrain  classification  [80-81], 
radiographic  Image  interpretation  [82,83] , microscopic  cell  image  analysis 
[84-86],  materials  Inspection  [87],  and  many  others. 

Most  of  the  previous  research  has  concentrated  on  the  statistical 
approach  [80-90].  In  this  approach,  statistical  properties  are  calculated 
from  a set  of  local  measurements  taken  from  the  pattern.  Weszka,  Dyer, 
and  Rosenfeld  [8l]  give  a comparative  study  of  several  frequently  used 
features  for  texture  classification. 

An  alternative  approach  to  the  statistical  one  for  texture  analysis 
Is  the  structural  approach  [78].  A texture  Is  considered  to  be  defined 
by  subpatterns  which  occur  repeatedly  according  to  a set  of  well-defined 
placement  rules  within  the  overall  pattern.  Furthermore,  the  subpattern 
Itself  Is  made  of  structured  elements.  Compared  with  the  statistical 


approach,  the  structural  approach  appears  to  be  easier  to  interpret. 

Zucker  [79]  proposed  the  Idea  of  texture  modeling  in  terms  of  an 
ideal  texture  and  its  transformations.  According  to  Zucker,  an  ideal 
texture  is  a deep,  unobservable,  highly-structured  perfect  pattern  in  which 
iocai  primitives  (fundamental  building  blocks)  are  extended  into  a global 
structure  such  as  regular  tesselation.  He  believes  that  transformation 
rules  can  be  defined  from  a representation  of  an  Ideal  texture  to  that 
of  a natural  texture.  In  our  work,  we  shall  propose  a tree  grammar  which 
defines  such  rules. 

Carlucci  [9i 1 has  formulated  a system  called  "texture  language"  for 
the  description  of  some  simple  repetitive  subpatterns  such  as  polygons 
or  open  polygonal s.  Texture  patterns  are  treated  as  graphs  with  the 
basic  elements  in  the  texture  language  representation  being  lines  and 
vertices.  The  structure  of  a subpattern  Is  then  represented  as  a tree. 

From  a practical  point  of  view,  Cariucci's  system  may  encounter  difficulties 
during  preprocessing,  such  as  difficulty  In  the  extraction  of  lines  and 
vertices  in  a texture  region. 

Ehrich  and  Foith  [1*2] , propose  a tree  language  approach  for  the 
description  of  the  structure  of  waveforms  called  "relational  trees." 

They  believe  that  information  about  textures  in  an  image  can  be  obtained 
by  sequential  analysis  of  individual  scan  lines.  The  change  of  gray 
levels  of  a scan  line  gives  a random  waveform  which  can  be  represented 
by  a relational  tree. 

We  shall  propose  a texture  model  based  on  the  structural  approach. 

In  this  model,  a texture  pattern  is  divided  into  fixed-size  windows. 
Repetition  of  subpatterns  or  a portion  of  a subpattern  may  appear  In  a 


— — 


|^L 


▼ 


window.  For  all  cases,  we  shall  treat  a windowed  pattern  as  a subpattern. 

A tree  grammar  is  then  used  to  characterize  windowed  patterns  of  the  same 
class.  This  model  can  be  used  for  texture  synthesis  as  well  as  dis- 
crimination. Since  the  windowed  patterns  are  also  a part  of  the  global 
structure  of  the  texture,  a higher  level  of  syntax  rules  can  also  be 
constructed  for  the  arrangement  of  windowed  patterns. 

5.2  A Syntactic  Model  for  Texture  Analysis 

We  propose  the  following  syntactic  model  for  texture  analysis:  its 
primitive,  window,  tree  representation,  and  tree  grammar  being  described 
here. 

5.2.1  The  Primitive 

We  choose  a single  pixel  with  different  gray  levels  to  be  the  pattern 
primitive.  For  a picture  of  Z gray  levels,  we  have  l different  primitives. 
However,  the  size  of  a primitive  can  certainly  be  larger  than  a single 
pixel.  For  example,  a window  of  n x n pixels  with  a relatively  uniform 
gray  level  would  be  a good  primitive. 

5.2.2  The  Window 

From  the  structural  point  of  view,  texture  is  the  placement  of 
structured  subpatterns.  (See  Figure  5.1  (b) ) . However,  in  a natural 
scene,  the  exact  boundary  of  a subpattern  is  usually  vague  and  unidenti- 
fiable. Often,  subpatterns  of  the  same  texture  appear  in  various  sizes, 
shapes,  or  brightness.  In  some  cases,  there  even  exists  a situation  In 
which  there  are  no  well-defined  subpatterns.  For  examples  of  this  see 
Figures  5*1  (c)  and  5.1  (d).  In  addition,  the  placement  of  subpatterns 
can  be  Irregular  and  distorted  such  as  shown  in  Figure  5.1  (a).  Nevertheless 
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Figure  5.1  (a ) Pattern  D2Z.  Reptile  Skin. 

Figure  5-1  Four  texture  patterns  obtained  from 

digitizing  pictures  found  in  Brodatz’s 
book,  Textures . 
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FFgure  5. 1 (b)  Pattern  D3b.  Netting. 


J 


1 M - B ^ M I 


•>••••«••••>.  ► I Ir  ► >> 

'i*  i; 

BtssKtsSiri::::;;?-',**"’  > 


• ■:%  »}?  }:.  ?ii nil?'?,  v 


• urnr 

. •••  • • j>\s**tjsn»(vuv3; },  . r v 

miiwii^imMMMmimniittt  swim 

::  . ..  ..  ^ i£»>-.S>S5** ......  ... 


jjvl.iij  ) ..  S>i^v* .... 




-r1-** ■*r^*?.^gfff38^ifflria8  t:«:l!;!da^ 

■ 


,:  x^fr:  !■  Mr 


B^fwHlWI,-  ••••••  ; 


Figure  5. 1(c)  Pattern  D38.  Water. 
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a small  subframe  of  the  overall  texture  pattern  does  maintain  some  of  the 

characteristics  of  the  texture.  To  make  the  syntactic  approach  practically 

feasible,  pictures  are  divided  Into  fixed-size  windows.  A grammar  is  then 

used  to  characterize  the  windowed  pattern  of  the  given  texture.  Assuming 

k2 

that  the  window  Is  of  size  k x k,  there  are  £ possible  patterns.  The 

set  of  all  the  windowed  patterns  of  a particular  texture  Is  a subset  of 
k2 

the  i patterns.  Consequently,  a high-dimensional  regular  grammar,  for 
example,  a tree  grammar,  Is  suitable  for  the  characterization  of  texture 
patterns. 

5.2.3  The  Tree  Representation 

Before  we  construct  the  tree  grammar  for  a texture  pattern,  each 
windowed  pattern  Is  first  transformed  Into  a tree  representation.  Each 
pixel  in  a k x k window  corresponds  to  a node  on  Its  tree  representation. 
Hence,  a pattern  primitive  becomes  the  assigned  label  to  its  corresponding 
node.  For  implementation,  a tree  structure  can  be  arbitrarily  chosen, 
but  is  then  fixed  for  all  windows  during  the  process.  That  Is,  for  all 
tree  representations,  the  tree  structure  Is  the  same,  but  node  labels 
are  different.  Two  convenient  tree  structures  are  suggested  in  Figures 
5.2  (a)  and  5.2  (b)  where  the  window  size  Is  9 x 9.  We  shall  refer  to 
them  as  Structure  A and  Structure  B,  respectively.  Clearly,  a different 
choice  of  tree  structure  will  result  in  a different  tree  representation 
for  the  same  window.  This  choice  will  Influence  the  complexity  as  well 
as  the  effectiveness  of  the  constructed  tree  grammar. 

Example  5.1.  The  pattern  shown  In  Figure  5.3  (a)  has  the  tree 
representation  shown  in  Figure  5.3  (b)  if  Structure  A Is  used. 
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Example  5.3.  Grammar  G^  in  Example  5.2  will  only  accept  patterns 
of  a cross  In  the  midle  of  the  9x9  window  such  as  the  texture 
pattern  shown  In  Figure  5-4.  in  a natural  texture,  we  will  most 
likely  have  some  distortions  of  the  perfect  pattern  such  as  the  pattern 
shown  in  Figure  5-5.  Grammar  G£  will  generate  patterns  having  a 
shifting  of  the  cross  in  Figure  5.3  (a)  anywhere  within  the  window. 
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Figure  5.5  A distorted  texture  pattern  which 
can  be  generated  by  G„. 


The  distorted  pattern  shown  in  Figure  5.5  can  be  accepted  by  G2. 


5-3  Illustrative  Examples  of  Texture  Synthesis 

In  Section  5-2,  a syntactic  model  is  presented  for  describing  windowed 
patterns.  The  global  structure  of  the  overall  texture  pattern  depends 
on  the  arrangement  of  windowed  patterns.  In  Example  5.3,  we  constructed 
the  grammar  G2  for  the  acceptance  of  the  texture  pattern  shown  in  Figure 
5.5.  However,  when  G2  is  used  for  generation,  numerous  patterns  might 
be  produced,  one  of  which  Is  shown  in  Figure  5*6.  Therefore,  in  order 
to  preserve  the  coherence  between  windows,  a set  of  higher  level  syntax 
rules  is  necessary  In  which  the  windowed  pattern  is  treated  as  a primitive, 
and  the  overall  texture  can  be  represented  as  a tree  which  decides  the 
placement  of  windowed  patterns. 

In  this  section,  we  will  illustrate  the  synthesis  of  patterns  D22, 

D34,  038,  and  D68  from  Brodatz's  Textures  [92].  Figures  5.1  (a),  (b) , 

(c) , and  (d)  are  digitized  pictures  with  resolutions  of  400y,  400y,  lOOji  and 
400y,  respectively,  of  the  above  four  patterns.  For  simplicity,  we  use 
only  two  primitives:  black  as  primitive  "1,"  and  white  as  primitive  "0." 

By  setting  a threshold  for  gray  levels  in  Figure  5.1,  we  obtain  four 
binary  pictures,  Figures  5-7  (a),  (b) , (c) , and  (d)  for  patterns  D22, 

D3*»,  038,  and  068,  respectively. 


Figure  5.6  A random  texture  pattern  which 


could  be  generated  by  G . 
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Figure  5.7(a) 
Figure  5-7 


Binary 

Binary 


picture  of  pattern  D22. 

pictures  of  patterns  In  Figure  5-1. 
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5.3«l  Regular  Tesselation 

The  texture  pattern  D3*»  Is  a hexagonal  ly  tesselated  pattern  which 
Is  nearly  what  Zucker  called  “Ideal  texture." 

Example  5.^.  A syntheses  of  hexagonal  tesselation  Is  as  follows: 
Assume  that  we  have  two  windowed  patterns;  namely,  Aj  and  Cj  shown 
In  Figures  5.8  (a)  and  (b) , respectively,  with  window-size  9x9. 
Tree  grammar  G^  generates  the  tree  representations,  Aj  and  Cj,  using 
Structure  A described  in  Section  5.2.3.  Tree  grammar  G^'  generates 
the  placement  rule  for  A^  and  Cj.  The  placement  rule  Is  given  In 
Structure  B.  G^  and  G^'  are  given  as  follows: 

G3  " ^V3»r»p3»  fAj.Cj})  over  <E,r> 

V ^ , A2 , A^ , Ajj , A^- , , Ay , C ^ ,C2,C^,C^,C^,C^,Cy,N ^ 

N^.Nq.I.O} 


A,<  /I \ 

N0  A2  N0 


/ 1 \ 

N0  *3  N0 
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No  \ No 
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»,  N, 
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The  generating  procedure  using  the  two-level  syntax  rules, 
and  Gj'  is  illustrated  in  Figure  5.9.  The  generated  pattern  is 
shown  In  Figure  5.10. 


t 
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Example  5.5.  In  Example  5.4,  we  assumed  that  the  window  size  matched 
the  size  of  the  hexagonal  subpattern.  However,  the  9x9  window  in 
Example  5.4  does  not  perfectly  match  the  pattern  D34  in  Figure  5-7 
(b) . An  improvement  is  shown  in  Figure  5.11  (b) , where  the  hexagons 
have  a similar  size  to  that  of  pattern  034.  In  Figure  5-11  (b) , 
window  frames  have  been  drawn  in  so  that  the  windowed  patterns  and 
their  repetitive  order  can  be  shown  clearly.  There  are  20  different 
windowed  patterns  within  each  heavily  lined  area  in  a 4 x 5 arrangement. 
The  larger  pattern,  made  up  of  the  20  windows,  also  repeats  itself. 

The  grammar  G^  that  generates  the  20  windowed  patterns,  is  given  on 
the  left-hand  side  of  Appendix  E-1.  Figure  5. i i (a)  shows  the  place- 
ment rule  in  Structure  B for  the  pattern  In  Figure  5.11  (bj . The 
symbol  In  each  cell  of  Figure  5.11  (a)  belongs  to  the  set  of  starting 
symbols  in  grammar  G^.  From  each  starting  symbol,  the  corresponding 
windowed  pattern  of  Figure  5.11  (b)  can  be  generated.  The  grammar 
that  generates  the  placement  rule  is  also  given  In  Appendix  E-1 
as  grammar  G^ ' . 

Example  5*6.  The  uneven  brightness  in  pattern  034,  e.g.,  darker 
for  horizontal  lines  and  lighter  for  diagonal  lines,  can  be  simulated 
by  using  a stochastic  grammar.  Figure  5.12  is  the  resulting 
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pattern  from  using  stochastic  grammar  Gc  In  the  generation  of  the 

5** 

"noisy  version."  The  grammar  G Is  given  on  the  right-hand  side 
in  Appendix  E-1 . 

In  this  subsection,  three  examples  were  given:  (l)  to  illustrate 
the  synthesis  of  an  ideal  texture  for  a matching  sized  window  and  sub- 
pattern, (2)  for  an  unmatched  size  window  and  subpattern,  and  (3)  for 
a noisy  version.  However,  the  noisy  version  described  in  Example  5.6 
is  the  result  of  local  noise,  in  the  following  subsection,  we  shall 
describe  the  synthesis  of  a global  structure-distorted  texture  pattern. 

5.3.2  Irregular  Tesselation 

Let  us  examine  pattern  D22  in  Figure  5-7  (a).  We  may  consider  that 
pattern  D22  is  the  result  of  twisting  the  regular  tesselation  of  an  ideal 
texture  such  as  the  pattern  shown  in  Figure  5. 13-  From  a single  window 
the  trend  of  distortion  cannot  be  fully  detected.  For  texture  synthesis, 
such  a global  distortion  can  be  treated  as  a problem  of  the  placement 
of  windowed  patterns. 

Example  5.7-  The  regular  tesselated  pattern  shown  in  Figure  5.13 
is  composed  of  two  basic  patterns  shown  in  Figure  5.1**  (a)  and  (b) . 

A distorted  tesselation  can  result  from  shifting  a series  of  basic 
patterns  in  one  direction.  Let  us  use  the  set  of  patterns  resulting 
from  shifting  a basic  pattern  as  the  set  of  primitives.  There  will 
be  81  such  windowed  pattern  primitives.  We  shall  refer  to  them 
simply  as  primitives  in  this  example.  Figure  5.15  shows  several 
of  them.  Each  primitive  Is  given  a name  of  two  symbols.  "Xj," 
where  X e {A,B,C ,D,E,F,G ,H, 1 } , i e (1,2,. ..,9).  Starting  from  X., 


the  pattern  resulting  from  shifting  one  column  to  the  left  will  be 
named  Xj+j , and  the  pattern  resulting  from  shifting  one  row  up  will 
be  named  Yj • Grammar  G^  In  Appendix  E-2  Is  constructed  for  the 
generation  of  the  81  primitives. 

Several  synthesis  results  are  given  in  Figure  5*16  (a),  (b) , and 
(c) . Tree  representations  using  Structure  B that  decide  the  placement 
of  windowed  patterns  are  shown  at  the  left-hand  side  of  each  pattern. 

Using  the  same  Idea  as  that  in  Example  5.6,  a stochastic  grammar  can 
be  used  to  add  local  distortions.  An  example  of  this  is  shown  In 
Figure  5.16  (d) . We  can  also  construct  a grammar  for  the  placement  of 
windowed  patterns  for  certain  types  of  structure  distortion.  For 
example,  a twisted  upward  or  downward  pattern,  or  an  insertion  of  an 
extraneous  row  of  subpatterns,  etc. 

5*3.3  Random  Pattern 

The  texture  pattern  D38  and  D68  in  Figures  5.7  (c)  and  (d)  show  a 
higher  degree  of  randomness  than  D22  and  D3^.  No  clear  tesseiation  or 
subpattern  exists  in  the  pattern. 

Example  5.8.  The  water  waves  In  pattern  D38  can  be  described  as  a 
belt  extending  In  the  horizontal  direction,  varying  In  width  and 
twisting  upward  or  downward.  Assuming  that,  at  most,  one  belt  can 
appear  in  a window,  we  shall  use  Structure  A for  tree  representation 
and  a stochastic  tree  grammar  Gg  to  describe  such  patterns.  In  each 
production  rule  In  Gg,  the  left-hand  side  nonterminal  Is  the  present 
state  and  the  right-hand  side  generates  the  width  and  the  position 
of  the  belt  that  the  present  state  represented,  as  well  as  the  next 
state.  Figure  5.17  Illustrates  this  generation  process.  The 
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Figure  5.16(a) 

Figure  5.16  Synthesis  results  of  pattern  D22 
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Figure  5.17  The  syntactic  generation  of  water  waves. 
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grammar  Gg  Is  given  In  Appendix  F-l.  Gg  Is  also  the  discrimination 
grammar  for  pattern  038  which  will  be  discussed  In  Section  5 .**. 

The  production  rules  associated  with  zero  probability  are  unused  rules 
during  pattern  generation.  They  are  added  for  pattern  discrimination. 
The  probabilities  associated  with  the  production  rules  In  Gg  are 
arbitrarily  assigned.  By  varying  the  assignment  of  probabilities, 
patterns  with  a different  degree  of  brightness  and  fluctuation  can 
be  generated.  Some  resulting  patterns  are  shown  in  Figure  5.18. 

Example  5.9.  The  texture  pattern  of  D68,  the  wood  grain  pattern, 
consists  of  long  vertical  lines.  It  Is  particularly  convenient 
for  syntactic  description  when  Structure  A is  used  for  tree 
representation.  The  subpattern  (vertical  line)  and  its  repetition 
can  be  fully  characterized  by  the  stochastic  grammar  Gy.  Therefore, 
there  is  no  need  to  generate  the  overall  pattern  window  by  window. 

The  grammar  Gy  Is  given  as  follows: 


Gy  “ (Vy , r , Py ,Aj ) over  <Z,r> 
Vy  « {A,,N0,N,,0,l} 
r - (0,1, 2, 3) 

Z = (0,1} 
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The  density  of  grains  (vertical  lines)  depends  on  the  probabilities 
associated  with  production  rules.  Pictures  in  Figure  5.19  are 
generated  from  G^  using  different  probability  assignments. 

5.**  Texture  Discrimination 

The  proposed  texture  model  can  also  be  used  for  texture  discrimination. 
In  Section  5*3,  we  illustrated  how  a texture  pattern  was  generated 
w Jndow-by-window.  The  construction  of  a grammar  In  modeling  the  variation 
of  size,  shape,  and  brightness,  as  well  as  noise  and  distortion  was 
illustrated  by  examples.  We  also  discussed  in  Section  5*2  that  a pattern 
in  a small  subframe  (window)  maintains  some  of  the  characteristics  of 
the  overall  texture.  Under  this  assumption,  we  shall  restrict  the 
problem  of  texture  discrimination  to  the  recognition  of  windowed  patterns 
only.  Each  picture  is  processed  window-by-window. 

5.4.1  Data  Preparation 

The  pattern  shown  in  Figure  5.20  consists  of  patterns  D22,  D34,  D38, 
and  D68.  There  are  180  x 180  pixels  with  128  gray  levels.  We  shall  use 
two  primitives  (two  gray  levels)  for  discrimination.  The  picture  shown 
in  Figure  5.21  is  obtained  by  setting  a threshold  at  gray  level  44 . 

Window  frames  are  drawn  in  Figure  5.21.  The  window  size  is  9 x 9. 


5.4.2  Discrimination  Grammars 

The  texture  modeling  grammars  described  in  Section  5.3  are  used  for 
discrimination  here.  Let  the  grammar  for  pattern  D22,  D34,  D38,  and  D68 
be  G22»  Gji,,  G^g,  and  Ggg,  respectively.  From  the  viewpoint  of 
discrimination,  we  would  like  to  modify  the  grammar  so  that  overlaps 


fic 


US&Hslf— - »a: 


Figure  5.20  Pictorial  data  for  texture  discrimination 
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^22’  ^3^’  ^38*  an£*  ^68’  resPecttve'y)  will  be  as  small  as  possible. 

Whereas,  each  language  itself  needs  to  be  as  general  in  characterizing 
each  class  of  texture  as  possible.  Grammar  ^22*  ^34*  an(*  ^68  are  9'ven 
in  Appendis  F-2,  F-3,  and  F-4,  respectively.  Grammar  G^g  is  the  non- 
stochastic version  of  grammar  Gg  in  Appendix  F-1. 

5.4.3  Error-Correcting  Parsing 

The  nonventional  parser  usually  fails  to  recognize  a "noisy" 
pattern.  Although  we  have  tried  to  construct  the  discrimination  grammars 
to  include  as  large  a variety  of  patterns  as  possible,  the  uncertainty 
existing  in  a pattern  is  impossible  to  be  fully  characterized  and 
predicted.  An  error-correcting  parser  can  be  used  to  improve  the 
classification  accuracy.  In  particular,  in  this  application,  we  shall 
use  the  SPECTA  fctructure-preserved  error-correcting  tree  automata)  as 
the  texture  discriminator. 

5.4.4  Computation  Result 

The  SPECTA  measures  the  distance  between  the  input  tree  representation 
and  the  texture  languages,  1(6^2) » L (G^^) , L(G^g),  and  L(Ggg)  one  by  one. 
Then,  the  input  pattern  is  classified  to  the  texture  class  which  has 
the  minimum  distance. 

The  result  of  texture  discrimination  for  the  picture  in  Figure  5.21 
is  given  in  Figure  5-22.  There  are  400  windows.  Thirty  of  them  are 
mis recognized.  The  misrecogni tion  usually  results  from  the  unavoidable 
overlap  between  two  languages  or  from  the  reduction  of  one  language  to 
decrease  the  overlap. 
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5.5  Remarks 

In  this  chapter,  a syntactic  approach  for  texture  modeling  Is 
presented.  The  proposed  approach  appears  to  be  attractive  from  the 
practical  point  of  view.  The  preprocessing  involves  picture  digitization 
only.  The  window  operation  stores  a small  subframe  of  the  pattern  in  the 
main  memory.  Thus,  the  process  is  manageable  by  a small  memory  computer. 

The  most  difficult  part  comes  from  the  construction  of  an  effective 
grammar.  Since  no  sophisticated  preprocessing  is  used,  the  linguistic 
representations  are  very  sensitive  to  noise.  In  constructing  a grammar, 
we  would  like  to  consider  as  many  variations  of  the  texture  pattern  as 
possible.  On  the  other  hand,  we  also  need  to  keep  the  grammar  as  simple 
(as  few  nonterminals  and  production  rules)  as  possible  to  save  storage 
space.  Such  a compromise  often  results  in  a grammar  that  generates  some 
excessive  sentences,  but  excludes  some  possible  distortions.  That  is  one 
reason  for  the  necessity  of  using  an  error-correcting  parser  for  picture 
parsing  in  texture  discrimination.  The  other  reason  is  the  uncertainly 
existing  in  the  picture  making  the  construction  of  a grammar  difficult  in 
order  to  fit  aii  the  possibilities  of  a texture  class. 

All  the  computation  examples  are  programmed  in  Fortran  IV  on  a PDP- 
i t/!»5  computer  with  a 32K  core  memory.  The  SPECTA  we  designed  processes 
ail  the  branches  of  a tree  from  the  frontiers  to  the  root  in  paraiiei, 
but  it  should  be  programmed  in  series  on  a general  purpose  computer.  The 
process  can  certainty  be  speeded  up  by  a specially  designed  processor. 

Automatic  grammatical  Inference  procedures  for  tree  languages  have 
been  recently  studied  l93l • By  combining  an  inference  algorithm  with 
the  proposed  discrimination  procedure,  an  automation  of  the  entire  train- 
ing ad  testing  process  as  proposed  in  Chapter  k can  be  implemented. 
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CHAPTER  6 

SUMMARY  OF  RESULTS  AND  SUGGESTIONS 
FOR  FURTHER  RESEARCH 


6.1.  Summary  of  Results  and  Conclusions 

The  problem  of  modeling,  analysis  and  reconstruction  of  noisy  and/or 
distorted  syntactic  patterns  Is  studied.  In  syntactic  pattern  recognition, 
a pattern  Is  described  In  terms  of  Its  subpatterns,  primitives  and  the 
relations  among  them.  Segmentation  errors  and  primitive  extraction  errors 
can  be  treated  as  syntax  errors  and  defined  In  terms  of  language  transformation 
rules.  Three  types  of  error  transformations  are  defined  on  strings,  namely, 
substitution.  Insertion  and  deletion.  Consequently,  the  parser  constructed 
according  to  the  grammar  generating  the  strings  and  the  three  types  of 
transformations  is  called  the  error-correcting  parser.  A stochastic  de- 
formation model  and  stochastic  error  transformation  rules  are  also  proposed. 

In  searching  for  the  most  likely  correction,  the  formulation  of  error- 
correcting  parser  (ECP)  for  context-free  languages  and  context-free  programmed 
languages  are  based  on  the  minimum-distance  criterion  for  non-stochastlc 
model  and  the  maximum-likelihood  criterion  for  stochastic  model. 

The  error-correcting  parsing  technique  for  string  languages  has  been 
extended  to  tree  languages.  In  formulating  error-correcting  tree  automata 
(ECTA),  five  types  of  error  transformations  on  trees  are  defined,  namely, 
substitution,  split,  stretch,  branch  and  deletion.  Two  types  of  ECTA  are 
proposed;  a SPECTA  corrects  substitution  errors  only,  and  a GECTA  corrects 
all  five  types  of  errors. 
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By  way  of  using  language  transformations,  the  distance  between  two 
sentences  - strings,  or  trees,  can  be  determined.  This  idea  provides  a 
necessary  tool  for  the  clustering  analysis  for  syntactic  patterns.  The 
algorithms  of  constructing  minlmun  spanning  tree  and  clustering  centers  in 
statistical  pattern  recognition  are  extended  to  syntactic  patterns.  A 
definition  of  distance  between  a sentence  and  a language  is  proposed.  Based 
on  this  definition,  a new  clustering  procedure  is  proposed,  where  a grammar 
is  inferred  to  characterize  a formed  cluster,  and  then  updated  when  a new 
pattern  is  assigned  to  the  cluster.  An  error-correcting  parser,  in  this  case, 
is  employed  to  measure  the  distance  between  an  input  syntactic  pattern  and  a 
formed  cluster,  or  a language.  Therefore,  the  procedure  yields  not  only  the 
clustering  results  but  also  the  syntax  rules  characterizing  each  cluster. 

Finally,  using  the  error-correcting  parsing  techniques,  a real  data 
example  on  texture  modeling  and  discrimination  are  presented.  In 
texture  modeling,  the  idea  of  window  operation  is  used.  Texture  patterns 
are  divided  into  fixed  size  windows.  Windowed  patterns  belonging  to  the 
same  class  of  texture  are  then  characterized  by  a tree  grammar.  This  tree 
grammar  is  used  for  texture  synthesis  as  well  as  discrimination.  However, 
in  texture  synthesis,  the  coherence  between  windowed  patterns  is  also  essential 
to  the  overall  pattern.  It  is  proposed  to  use  a higher  level  syntax, for  example, 
another  tree  grammar,  as  a monitor  for  the  placement  of  windowed  patterns. 
Consequently,  structural  distortion  can  be  simulated  by  changing  the  place- 
ment of  windowed  patterns,  whereas,  local  noise  can  be  simulated  by  using 
stochastic  grammars  for  characterizing  windowed  patterns,  in  texture 
discrimination,  a set  of  SPECTA,  where  each  is  constructed  for  one  class  of  > 
texture,  is  used  as  discriminator.  An  input  windowed  pattern  is  analyzed 


by  the  SPECTA's  then  classified  according  to  the  nearest  neighbor  rule. 

6.2.  Suggestions  for  Further  Research 

The  proposed  similarity  measure  and  error-correcting  parsing  scheme 
provide  a formal  model  and  a useful  tool  for  recognition  under  uncertainty, 
for  example,  in  a clustering  problem  where  correct  classification  for  all 
samples  are  unknown,  or  in  a noisy  environment,  such  as  the  analysis  of 
texture  patterns.  To  improve  the  recognition  results,  the  following  problems 
need  further  investigation: 

(1)  To  improve  the  parsing  efficiency.  The  weakness  of  error-correcting 
parsing  is  the  need  of  long  computing  time.  In  the  case  of 
error-correcting  parsing  for  context-free  languages,  sequential 
method  has  been  proposed  to  reduce  the  computing  time.  However, 
further  improvement  is  still  needed.  In  the  texture  discrimination 
example,  patterns  are  divided  Into  fixed-size  windows,  and  then 
each  window  is  classified  by  using  SPECTA.  Both  window  operation 
and  SPECTA  are  parallel  operations.  Parallel  processing  techniques 
are  expected  to  be  very  useful  in  improving  the  computation 
efficiency. 

(2)  The  inference  of  updated  grammars.  - The  knowledge  of  classifi- 
cation obtained  from  using  error-correcting  parsers  is  accumulated 
by  way  of  updating  pattern  grammars.  Different  from  the  existing 
grammatical  inference  procedure  which  often  require  all  the 
training  data  be  available  at  the  same  time,  training  data  are 
given  sequentially  in  the  grammar  updating  problem.  To  update 

a grammar  to  be  simple  and  effective  is  a problem  for  future 
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(3)  The  inference  of  a set  of  weights  associated  with  error 
transformations.  - Using  weighted  distance  can  Improve  the 
clustering  results.  The  problem  of  finding  a set  of  appropriate 
weights  from  training  samples  requires  further  study. 

(4)  The  inference  of  texture  grammars.  - Due  to  the  noise  and 
variation  of  texture  patterns,  It  Is  cumbersome  to  construct 
texture  grammars  manually.  To  be  more  practical,  the  proposed 
syntactic  model  for  texture  analysis  needs  an  efficient 
inference  procedure.  There  are  two  problems  to  be  studied;  the 
first  is  the  inference  of  stochastic  tree  grammar  for  windowed 
patterns,  and  the  second  is  to  infer  placement  rules  for  a 
texture  pattern.  In  the  second  problem,  the  regularity,  or 

the  repetition  of  the  basic  texture  patterns  has  to  be  determined. 
Other  problems  such  as  choosing  a suitable  window  size  based 
on  density  or  coarseness  of  textures,  and  finding  the  relations 
between  window  size,  the  effectiveness  of  texture  grammar  and 
parsing  efficiency,  etc.  are  also  interesting  problems  to  be 
investigated. 

(5)  To  apply  the  proposed  syntactic  texture  model  to  analyze  real  world 
data,  such  as  aerial  photographs.  X-ray  images  and  LANDSAT  data, 
etc. 
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APPENDIX  A 

PROOF  OF  THE  CONSISTENCY  OF  THE  MULTIPLE  ERROR  MODEL 

Assume  that  deformation  probabilities  on  terminal  a e E is  consistent 
on  a single  error  model;  l.e.,  equation  (2.1)  in  Section  2. A is  satisfied. 
By  summing  over  all  the  cases  in  equation  (2.2)  we  have, 


l * q(a|a)  - {qD (a) } + { J [q$ (b |a)  + q , (b |a)qD (a) ] } 

acE  bcE 


1-1 


+ { l [ n ( l q.(b  | a) ) ) f l (qs(b  |a) 
1-2  j-1  btE  b eE 


+ q (b  |a)qn (a))]}  . 


(A1) 


From  equation  (2.7),  the  first  three  terms  of  (Al)  can  be  reduced  to 


[1-  l q.(b|a)3  + [ l q. (b [ a) ] q_ (a) 
beE  beE 


(A2) 


and  the  fourth  and  fifth  term  of  (Al)  can  be  reduced  to 


I [ n ( l q.(b.|a))][1-q  (a)  - l q.(b  [a) 
1-2  j-1  b eE  1 b eE 


+ I q . (b  |a)q  (a)] 


b eE 


(A3) 


I q(a|a)  - (A2)  + (A3) 
acl 

- 1-q,(a)  (l-qp(a)) 

+ l [V  q,(a)][(l-q.(a)) 
1-2  j-J  1 ' 

(1-qD(a))  ] - 1 


where  q, (a)  - l q.  (b  |a) 
1 b eE 


APPENDIX  B 


SEQUENTIAL  CLASSIFICATION  ALGORITHMS 

A A 

Let  e be  a parameter,  0 < e < I. 

Algorithm  3.  Decision  Algorithm 

Input;  String  a^. 

Output;  C£,  1 < i < k. 

Method: 

Step  1 . Set  J-0  and  compute  r*»1  - j P(Cj).  If  r < e,  stop  and 
assign  class  C^,  where  P - "^x  PlCj).  If  r ^ e,  set 
J-1  and  go  to  Step  2. 

Step  2.  Parse  the  Jth  Input  symbol  by  SPA. 

Step  3.  If  Step  2 receives  parsing  failure  flag  from  all  pattern 
grammars,  stop  and  assign  class  C^,  where 
p(Ct|a,a2.. .aj_,o)  - p(Cj |a,a2. . .aj.,o) , otherwise  go 
to  Step  4. 

Step  A.  If  J-n,  stop  and  classify  to  C^,  where 

p(CJt|a]a2...an)  - m^x  p(Cj  la^. . .afl) . If  Ji*n,  compute 
r»l  - "^X  p(Cj |*|*2> • .ajo) , go  to  Step  5. 

A 

Step  5.  If  r < e,  stop  and  assign  C^  where  pCc^la^. . .aja) 

- "^x  p(Cj la^aj.. .ajo) , otherwise.  If  J*n  then  stop, 
otherwise,  set  J«J+1 . Go  to  Step  2. 
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Algorithm  4.  Sequential  Parsing  Algorithm 

Input;  A SCFG  Gj  ■ (Nj  ,Ej  ,Pj  ,Sj)  and  an  Input  string  a^a^.^aj. 
1 j _<  n. 


Out£ut:  p(a1a2. « « a j | C j ) and  p(aja2.».ajOt|Cj)» 

Method: 

Step  1 . Set  J*0,  add  I tern  [2  -*■  • Sj  ,0,1,1]  to  lj. 

Step  2.  (a)  If  [A  -*■  a • BfJ,l,p,r]  Is  In  I j , and  B y Is  In  P{ , 

add  item  [B  -*■  • Y»J»d»0]  to  I . . 

(b)  If  [A  -*•  o l,Pj,r]  Is  In  1 . , for  all  Item  In  lj  of 
the  form  [B  -*•  B • Ay,k,p2,s),  add  Item  [B  -*■  BA  • y>k, 

Pj,0]  to  lj  where  p^  ■ P|P2»  unless  an  Item  of  the 

form  [B  -*■  BA  • Y*k,q,t]  Is  already  In  k.  If  this 
Is  the  case,  set  q * q + PjP2* 

Step  3-  (a)  For  each  Item  lj  of  the  form  [A  -*•  a • BB,i,q,r], 

where  I < j,  find  In  l|  all  the  Itmes  of  the  form 

[C  -*■  p • AY,k»t,s].  Suppose  that  there  are  m such 

Items  with  values  for  s equals  to  s.s_...s  , 

l 2 m 


respectively,  Set  r • q(sj  + s2  + ...  + sm) . 

(b)  Locate  all  Items  In  lj  of  the  from  [A  -*■  a • BB,j,q,r]- 

If  there  are  n such  Items,  number  these  Items  such 

that  kth  of  them  Is  denoted  as  [Ak  -*•  ak  • B^B^J  ,qk,rk]  . 

For  the  kth  Item,  locate  In  lj  all  Items  of  the  form 

[C  u • AkY»J»t,s].  If  there  are  mk  such  items,  we 

denote  the  values  for  s In  these  Items  as  s,s-...s 

1 2 mk 

then  r^  ■ qk(sj  + s2  + ...  + 5^),  k ■l,2...n.  Note 
that  either  Sj  Is  determined  by  Step  3(a)  already,  or 
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Sj  Is  not  known  and  Is  one  of  the  unknowns  Tj 
(j  ■ I,2...n).  This  given  n linear  equations  In 
n unknowns  from  which  rj,r2...rn  can  be  determined. 
Step  4.  If  j*0,  to  go  Step  7;  otherwise,  go  to  Step  5. 

Step  5.  Compute  p(a|a2. • -aj |Gj)  as  follow,  If  an  Item  of  the  from 
[Z  -*■  Sj  • , 0,p,r]  Is  In  I j , set  p(a(a2. . .aj  |G  ( ) - p, 
otherwise  p^^.. . a^ t G | ) - 0. 

Step  6.  Applying  pta^. . .aj  |Gj ) and  pCa^. . . aj  a | G ^ ) obtained 

from  Step  5 and  Step  8 to  Algorithm  3*  if  It  Is  decided 

to  continue,  go  to  Step  7;  otherwise,  stop  and  classify. 

Step  7.  Set  j“j+l . For  each  Item  In  lj_j  of  the  form 

[A  -*■  a • ajB,l,q,r],  add  Item  [A  -*•  aa^  I ,q,0]  to  ij,  go 

to  Step  8.  If  no  Item  of  the  form  [A  -*■  a • ajB,l,q,r] 

In  I j _ j exists,  set  parsing  failure  flag  and 

p(ata2. . .aja|Gj)  - 0,  p(a(a2. . .aj |Gj)  - 0,  stop. 

Step  8.  Locate  the  I terns  which  are  added  to  Ij  by  Step  7-  Suppose 

that  there  are  n such  items,  number  the  mth  of  them  as 

[A  -*■  oa.  • 8,1  ,P  ,rl.  Find  all  Items  In  I,  of  the  form 
J m m m f 

n? 

[B  -*■  y • A6,k,q,s]  and  suppose  that  there  are  such  Items 

with  parameters  s denoting  as  SpS2...s£  , then 

p (a . a« . . .a . | G • ) + p(a.a2  • • .a.oi  .G.)  * £ p J , go  to 

1 c J 1 1 J 1 m-1  l*=l 

Step  2. 
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HIGHWAY  GRAMMAR 


Gh  = (V,P,r,S)  over  <Z,r>  where, 

V = {S,Hq,Xq,Ap  . .g,  Bp..g,  Dp**, 

Mp • *7»  h,b,$} 

r($)B  {1} 
r(h)*  {0,1 ,3) 
r(b)*  {0,1,3} 


(f)  Group  F,  M 


Figure  C-l  Continued 
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D. 1 Character  Cramnars 
(a)  Character  A 


GA  ’ (VrA*PA’A)  Where 

VA  - {A,A]tA2,Na,Nd,Nc},  ZA-{i,a,c,d} 

r.O)  - {2},  rA(a)  - {0,2},  rA(c)  - {0,1},  rA(d)  - 


/\ 

A,  Aj, 


N N . 
a d 


Na  - a,  N . - d,  Nc  - c 


appendix  d 

THE  GRAMMARS  AND  RESULTS  OF  THE 
CHARACTER  RECOGNITION  EXAMPLE 


(b)  Character  C 

6C  " ^Vc'rC,PC’C^  where 

Vc  - {C,Cj ,C2,Cj,Nd,Nf},  Ec  - {a,b,d,f } 

rc0)  - {2},  rc(a)  - {!>,  rc(b)  - {!},  rcld) 


{0,1  },  rr  (f)  - {0} 


wmm i 
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Nd  - d»  Nb  - b 


(d)  Character  E 

G. 


(VE,rE,PE,E) 


{E,E1,E2,E3>Hd>,  Ee  - {1,b,d> 


rE(l)  - {2},  rE(bj  - {! ,2},  rE(d)  - {0,1} 


P,:  i 


* /\ 


E1  Nd 


E1  * 


V 

A 


E2  Nd 


E2 


E3 


Nd-d 


l 
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Appendix  0.2.  Classification  Results  of  26  Test  Patterns 
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APPENDIX  F 

DISCRIMINATION  GRAMMARS  FOR  PATTERN  D22,  D3*,  D38,  and  D68 
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APPENDIX  G 

ALGORITHMS  FOR  THE  COMPUTATION  OF  THE  DISTANCE  BETWEEN 
TWO  STRINGS,  AND  THE  MINIMUM  SPANNING  TREE 
OF  A SET  OF  PATTERNS 


G.l.  The  Distance  Between  Two  Strings 


Algorithm  G.l 


Input:  Two  strings  x ■ a.,a0,...,a  , y - b.,b  ,.,.,b 

• c n I z m 

Output:  d(x,y) 

Method ; 

Step  1 . D(0,0)  ■ 0 

Step  2.  DO  1 - 1,  n 

0(1,0)  - D(  1-1 ,0)  + 1 
00  j - 1,  m 
D (0 , j ) - D(0,J)  ♦ 1 

Step  3.  DO  1 * 1 , n 
DO  j - 1,  m 

e,  - D(1-1,J-1)  + 1 If  a,  - bj,  or  e,-  D(1-1,J-1)  If  a,  i*  bj 
e2  - D(M,J)  + 1 
e3  - D ( 1 , j — 1 } * 1 
D(I,J)  - mlntej ,e2,e3) 

Step  k.  d(x,y)  - D(n,m),  exit 


G.2.  The  Minimum  Spanning  Tree 


Algorithm  G.2 


Input:  X « (x^ ,x2,.. .,xn},  a set  of  sentences 

Output:  The  minimum  spanning  tree  of  X 


t'tf 
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Method; 

Step  I.  Assume  that  there  are  n nodes 

Step  2.  Compute  d(xj.Xj)  for  all  l,  j.  Let  d(xj,Xj)  be  the 

length  of  the  arc  connected  node  I and  j,  and  denote 
as  d(l,j). 

Step  3.  List  all  arc  (l,J)  In  the  order  of  Increasing  d(l,j) 

Step  4.  Put  the  first  arc  (p,q)  on  the  list  Into  list  A 

Step  5.  Put  the  next  arc  on  the  list  Into  A,  except  If  a circuit 
can  be  found  with  the  arcs  already  in  A. 

Step  6.  If  all  nodes  are  connected,  stop,  otherwise  go  to  Step  5. 
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